aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/contrib/nccl
Commit message (Collapse)AuthorAge
* Move nccl_rewrite.cc back to tf_kernel_library.Gravatar Gunhan Gulsoy2018-09-13
| | | | PiperOrigin-RevId: 212858590
* [TF] Variant improvements.Gravatar Eugene Brevdo2018-09-11
| | | | | | | | | | | | | | | | | | 1. Change Variant Decode to accept VariantTensorData (non-ref). This should allow some optimization in the future. In the meantime it means removing the variant.h include from tensor.h, since variant_encode_decode.h now relies on tensor.h and variant.h now relies on that. It also means we found a bunch of places where tensor.proto.h, variant.h, and mutex.h were being imported through tensor.h (along with a bunch of other crap); so now we directly import them in order to compile. 2. Move Variant registry to use TypeIndex instead of a TypeName string; this should speed up registry lookups. PiperOrigin-RevId: 212478896
* fix C++ header guards.Gravatar A. Unique TensorFlower2018-08-21
| | | | PiperOrigin-RevId: 209679086
* Make sure tensorrt and nccl kernels compile with newer versions of Eigen.Gravatar A. Unique TensorFlower2018-08-15
| | | | PiperOrigin-RevId: 208902416
* [ROCm] Interface changes for StreamExecutor to support both CUDA and ROCmGravatar Wen-Heng (Jack) Chung2018-07-12
| | | | | | | | | | | | | | | | | | | 1) StreamInterface::CudaStreamMemberHack() Despite the fact that StreamExecutor and GPU common runtime are largely orthogonal, this particular routine in StreamExecutor is used in GPU common runtime and a couple of other operators. In this commit it's renamed as StreamInterface::GpuStreamMemberHack() and their call sites are also changed. 2) StreamExecutorInterface::CudaContextHack() This member is renamed to StramExecutorInterface::GpuContextHack(). Changes introduced in this commit includes: - some StreamExecutor interfaces and CUDA implementation - GPU common runtime related to interface changes in StreamExecutor - operators affected by interface changes in StreamExecutor
* Merge pull request #20360 from ppwwyyxx/patch-13Gravatar Yifei Feng2018-07-02
|\ | | | | Fix gradient of nccl_ops
* | Fix Windows GPU BuildGravatar A. Unique TensorFlower2018-06-28
| | | | | | | | PiperOrigin-RevId: 202260254
| * Fix gradient of nccl_opsGravatar Yuxin Wu2018-06-27
|/
* Load NCCL lib on-demand to facilitate default NCCL version upgrade to 2Gravatar Smit Hinsu2018-06-18
| | | | | | | | | Change in the default version to NCCL 2 would require all TF users to download the NCCL library without the on-demand loading. With on-demand loading, it will only require users using the nccl ops to download and install the NCCL lib. PiperOrigin-RevId: 201109554
* Deprecate `DeviceBase::GetStepAllocator()` and replace with calls to ↵Gravatar Derek Murray2018-05-24
| | | | | | | | | | | | | | | | | `GetAllocator()`. The `GetStepAllocator()` API relied on the existence of a "step resource manager", which is no longer a concept in the runtime (it was replaced by "step containers"). Since the additional flexibility does not appear to be used in the codebase, the `GetScopedAllocator()` seems to provide a similar extension point (based on step IDs), and the `OpKernelContext::get_allocator()` method is called frequently, this change simplifies the implementation somewhat. The `GetStepAllocator()` method is retained as a non-virtual stub that forwards to `GetAllocator()`, because at least one third-party library (libxsmm) calls this interface directly. PiperOrigin-RevId: 197922154
* Use tensorflow::se instead of perftools::gputools for StreamExecutor.Gravatar Justin Lebar2018-04-23
| | | | PiperOrigin-RevId: 194010749
* Enabling fp16 for NCCL 1 and 2.Gravatar A. Unique TensorFlower2018-04-09
| | | | PiperOrigin-RevId: 192096789
* Add support for NCCL2. The configure script asks for what version of NCCL to ↵Gravatar A. Unique TensorFlower2018-04-05
| | | | | | | | use. The default is still NCCL 1 from GitHub. If the user chooses NCCL 2, it asks for the install directory. The nccl_configure.bzl generates two different BUILD files based on the chose NCCL version. For NCCL 1, it aliases to the existing 'nccl_archive' http_repo on GitHub. For NCCL 2, it creates a target containing the NCCL 2 library and headers from the chosen install directory. PiperOrigin-RevId: 191718007
* Replaced calls to deprecated tensorflow::StringPiece methods with theirGravatar A. Unique TensorFlower2018-04-02
| | | | | | | | tensorflow::str_util equivalents. This will allow the deprecated methods to be removed. PiperOrigin-RevId: 191350894
* Remove all_opensource_files. It's not needed any more.Gravatar Martin Wicke2018-03-28
| | | | PiperOrigin-RevId: 190878279
* disbaling timeout in guitarGravatar Olivia Nordquist2018-03-08
| | | | PiperOrigin-RevId: 188383577
* eager: Rename in_eager_mode to executing_eagerly and get rid of in_graph_mode.Gravatar Asim Shankar2018-03-07
| | | | | | | | This is in preparation to introduce one public, stable symbol: tf.executing_eagerly() (i.e., part of moving APIs related to eager execution from "contrib" to a namespace where we provide API stability guarantees) PiperOrigin-RevId: 188212646
* Remove THIRD_PARTY_ from #include guardsGravatar Sanjoy Das2018-01-24
| | | | | | They don't make sense in the open source repository. PiperOrigin-RevId: 183140889
* Fixes nccl_ops_test error introduced in CL 181736230.Gravatar A. Unique TensorFlower2018-01-16
| | | | PiperOrigin-RevId: 182039394
* Moves CUDA-only condition of NCCL further up to dependent targets.Gravatar A. Unique TensorFlower2018-01-12
| | | | | | | Removes NCCL kernel registration in non-CUDA builds (but retains NCCL ops). Removes unused python/ops/_nccl_ops.so target. PiperOrigin-RevId: 181736230
* Replaces std::random_shuffle by std::shuffle for C++17 compatibility.Gravatar A. Unique TensorFlower2018-01-09
| | | | PiperOrigin-RevId: 181350723
* Rename Stream::BlockHostUntilDoneWithStatus to BlockHostUntilDone.Gravatar A. Unique TensorFlower2017-12-13
| | | | PiperOrigin-RevId: 178951330
* Use BlockHostUntilDoneWithStatus in various places.Gravatar A. Unique TensorFlower2017-12-11
| | | | PiperOrigin-RevId: 178723711
* Make NCCL code ready for NVIDIA's NCCL 2.Gravatar A. Unique TensorFlower2017-11-29
| | | | PiperOrigin-RevId: 177306507
* Use LINKER_INITIALIZED for mutexes with static storage class.Gravatar A. Unique TensorFlower2017-11-21
| | | | | | | | | | | | | | | | | | | | | This was causing exit-time races as some threads were accessing the mutex as it was being destructed. --------------- It is illegal to use any static type with a constructor/destructor with static storage class in a multithreaded C++ programme that can exit(), even if the constructor is protected by C++11's function-scope static initialization rules, because exit-time destruction is unsafe in the presence of multiple threads. For things that are not function-scope, the construction is also unsafe, because global contruction ordering is undefined in general. The LINKER_INITIALIZED variant constructor for TensorFlow's mutex avoids these problems, at the cost of relying on the linker to zero-initialize the BSS region. PiperOrigin-RevId: 176612772
* Internal change.Gravatar Anna R2017-11-10
| | | | PiperOrigin-RevId: 175213336
* Check GPU availability after creating test session.Gravatar A. Unique TensorFlower2017-11-10
| | | | PiperOrigin-RevId: 174983466
* BUILD cleanup in contrib/...Gravatar A. Unique TensorFlower2017-10-30
| | | | PiperOrigin-RevId: 173889798
* Fix NCCL rewrite bug when rerunning sessions (assigned device id is not stable).Gravatar A. Unique TensorFlower2017-10-24
| | | | | | | Fix collocate_gradients for initial losses. Remove NcclBroadcast gradient test for now. The generated AddN to accumulate the broadcast outputs before passing it to the gradient function is CPU only and cannot be collocated with NcclBroadcast on the GPU. PiperOrigin-RevId: 173306409
* Fix breakages in OSS buildsGravatar Shanqing Cai2017-10-23
| | | | | | | | | | | | | See example breakages logs at: http://ci.tensorflow.org/job/tensorflow-cl-cpu-python3-pip/10847/console http://ci.tensorflow.org/job/tensorflow-cl-gpu/11008/console 1. CL/172477381 added the no_oss tag to tests with oss_serial tags, which broke the logic of OSS_SERIAL tests in pip.sh and run_pip_test.sh. This CL fixes that. 2. The nccl_kernels BUILD target in contrib/nccl/BUILD was missing some dependencies. This CL adds the missing ones. Fixes: #13918 PiperOrigin-RevId: 173133914
* Replace NcclReduce/Broadcast ops during graph optimization so that we can ↵Gravatar A. Unique TensorFlower2017-10-15
| | | | | | | | generate gradients for Reduce/Broadcast. Changing _NcclBroadcastRecv shape input to int32 so that the corresponding Const op is outputting to HostMem. PiperOrigin-RevId: 172279684
* Update to NCCL version 1.3.5. Remove temporary buffer for ncclReduce, it's ↵Gravatar A. Unique TensorFlower2017-09-19
| | | | | | no longer needed in this version. PiperOrigin-RevId: 169221983
* Adding NCCL sum op, register all_sum gradient.Gravatar A. Unique TensorFlower2017-09-12
| | | | | | Streamlining nccl test. PiperOrigin-RevId: 168347428
* Enable contrib.nccl test.Gravatar A. Unique TensorFlower2017-09-05
| | | | PiperOrigin-RevId: 167592162
* Optimize convert_to_tensor.Gravatar A. Unique TensorFlower2017-08-28
| | | | PiperOrigin-RevId: 166805061
* Merge changes from github.Gravatar A. Unique TensorFlower2017-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | END_PUBLIC --- Commit 9f81374c3 authored by raymondxyang<zihao.yang@microsoft.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Add option for build more python tests in Cmake (#11853) * Ignore Windows built project * Fix deprecated methods in tf.contrib.python * Fix regex match for Windows build in contrib.keras * Fix Regex match for Windows build in session_bundle * * Fix deprecated methods * Fix regex match for Windows * Fix compatibility issue with Python 3.x * Add missing ops into Windows build for test * Enabled more testcases for Windows build * Clean code and fix typo * Add conditional cmake mode for enabling more unit testcase * Add Cmake mode for major Contrib packages * Add supplementary info in RAEDME for new cmake option * * Update tf_tests after testing with TF 1.3 * Clean code and resolve conflicts * Fix unsafe regex matches and format code * Update exclude list after testing with latest master branch * Fix missing module --- Commit 98f0e1efe authored by Yong Tang<yong.tang.github@outlook.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Dynamic ksize and strides with MaxPool (#11875) * Dynamic ksize with max_pool This fix tries to fix the issue raised in 4746 where ksize is static (attr) with max_pool. This fix changes ksize to input tensor so that it is dynamic now. This fix fixes 4746. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add dynamic ksize to MaxPoolGrad and MaxPoolGradGrad Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add test cases for max_pool_v2 Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix GPU Jenkins issue. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Enable MaxPoolV2 in GPU Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Hide MaxPoolV2 and other fixes. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit 02d6bc185 authored by Bairen Yi<byronyi@users.noreply.github.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: remove useless variable (#12212) --- Commit ed6b0d905 authored by namrata-ibm<bhavenamrata@gmail.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Adding support for s390x in calculation of cpu_frequency (#12201) --- Commit 627dfc9dd authored by Taehoon Lee<taehoonlee@snu.ac.kr> Committed by Taehoon Lee<taehoonlee@snu.ac.kr>: Fix typos --- Commit c0f9b0a91 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: In fast-math mode emit a tanh that has a faster min/max. PiperOrigin-RevId: 164943597 --- Commit 87605f3d6 authored by Kay Zhu<kayzhu@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [TF:XLA] Use HloEvaluator for ComputeConstant, remove the need of a dedicated compute constant backend. PiperOrigin-RevId: 164940970 --- Commit 881de45c2 authored by Taehoon Lee<me@taehoonlee.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Add bool type supports for GPU kernels (#11927) * Add bool type supports for GPU kernels * Add bool type test codes for GPU kernels --- Commit eeacdcdb1 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add missing "CPU" suffix in registrations. PiperOrigin-RevId: 164939527 --- Commit de01be952 authored by namrata-ibm<bhavenamrata@gmail.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Adding support for Big Endian in graph_constructor_test and wav_io (#12179) --- Commit 26719d29f authored by QingYing Chen<pkudysj@126.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Implement CRF decode (Viterbi decode) for tensor (#12056) * Implement CRF decoding for tensors * add test code for tensor version's CRF decoding * made modifications according to pylint * add some comments for crf decode * remove useless code * add comments at the top comment of crf module and add more comments in crf_test * capitalize first char of first word in comments * replace crf_decode test code with a deterministic example --- Commit f9a81ca2f authored by Pete Warden<pete@petewarden.com> Committed by gunan<gunan@google.com>: Create CI build script for Raspberry Pi (#12190) * Create CI build script for Raspberry Pi * Moved location of Pi build script --- Commit e2a163a90 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Merge code from PR #11940 with internal changes from cl/164796436, and update Python tests to also run on GPU. PiperOrigin-RevId: 164929133 --- Commit 08bbfa187 authored by Taehoon Lee<me@taehoonlee.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Fix typos (#12195) --- Commit ab96f41fb authored by Luke Iwanski<luke@codeplay.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: [OpenCL] Extends matmul_benchmark.py to cover SYCL (#11697) * [OpenCL] Extends matmul_benchmark.py to cover SYCL * Fixed typo * /gpu:0 -> /device:GPU:0 * Fixes control_flow_ops_py_test * /gpu: -> /device:GPU: * Fixes //tensorflow/python/profiler/internal:run_metadata_test * gpu: -> GPU: * Fixes tfprof_node * [OpenCL] Fixes device path to name with many colons (#123) The device path is constructed from a device name by replacing all colons with underscores. Some device names contain more than one colon, for example 'device:SYCL:0' which gives a path 'device_SYCL_0'. The previous code would not convert this back to the original device name, but rather to 'device:SYCL_0'. An alternative fix would be to convert all underscores to colons in the device name (i.e. remove the restriction inside `replace("_", ":", 1)`), however I'm not sure if there are any device names which contain underscores. * If no gpu device aviable fake one * gpu: -> device:GPU * Fixes profiler test * /gpu:x -> /device:GPU:x * Fixes debug_io_utils_test.cc test * Fixes device_name_utils_test.cc --- Commit 35e7a3665 authored by Yong Tang<yong.tang.github@outlook.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Remove unneeded casting of int64 for reverse_sequence (#12192) This fix remove unneeded cast of int64 for reverse_sequence: ``` lengths = math_ops.to_int64(lengths) ``` as int32 has already been enabled for reverse_sequence. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit 9fba8c185 authored by Anna R<annarev@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add benchmark dashboard link to benchmarks doc. Also, I added a link and description for Benchmarks page to Community index page. PiperOrigin-RevId: 164924906 --- Commit bb6f32fa7 authored by Mark Heffernan<meheff@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Make HloAliasAnalysis updatable after changes to the HLO graph. As part of this change make HloAliasAnalysis a thinner layer which basically only holds a map from HloValue to HloBuffer and vice versa. PiperOrigin-RevId: 164923041 --- Commit 9103096c1 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by Thomas K?ppe<tkoeppe@google.com>: Merged commit includes the following changes: 164923041 by meheff: Make HloAliasAnalysis updatable after changes to the HLO graph. As part of this change make HloAliasAnalysis a thinner layer which basically only holds a map from HloValue to HloBuffer and vice versa. -- PiperOrigin-RevId: 164923041 --- Commit 822603aed authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Merging sibling fusion instruction using multi_output_fusion PiperOrigin-RevId: 164920220 --- Commit c035aa2a8 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 164917891 --- Commit e1e81d9ba authored by Luke Iwanski<luke@codeplay.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: [OpenCL] Fixes double memcpy bug (#151) (#12173) * [OpenCL] Fixes double memcpy bug (#151) As the debg CopyOp is called on a Tensor without type, we need to use the DataType enum to get type information, and use this to pass the type on to Eigen. This is a workaround Eigen's need to have a type when calling memcpy. If the Eigen memcpy can be provided without a type requirement, then the memcpy in sycl_util is unnecessary. * Acts on feedback from: #12173/files/32cb12a9001b672425867b5a3110fd98e737a20b#r132496277 --- Commit d9ca2d86d authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Internal change PiperOrigin-RevId: 164916465 --- Commit b8d13d218 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Remove more parts of DCASGD missed in the first pass. (47949b) PiperOrigin-RevId: 164914552 --- Commit 73b3d52c7 authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: cmake fix PiperOrigin-RevId: 164911656 --- Commit 2173b5b0a authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Allow TFE_TensorHandleCopyToDevice to have the same device as src and destination. It will reuse the same underlying buffer in those cases. PiperOrigin-RevId: 164909906 --- Commit 13eb3b90e authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Experimental C and Python APIs to invoke TensorFlow kernels on concrete values. PiperOrigin-RevId: 164902588 --- Commit 7dfabcc01 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Initialize ExecutionOptions in ComputeConstant to default values. PiperOrigin-RevId: 164894867 --- Commit c8897e9bc authored by Benoit Steiner<bsteiner@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Static required time computation PiperOrigin-RevId: 164894645 --- Commit 076158f9b authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Enable implicit->explicit conversion by default. PiperOrigin-RevId: 164890915 --- Commit 58c4a4cb1 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Bugfix: number of input channels is not necessarily in the last dimension, after introduction of data_format param. PiperOrigin-RevId: 164889729 --- Commit 8f9b1af8a authored by Igor Saprykin<isaprykin@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Recover MonitoredSession when the Coordinator is requested to stop with one of the _PREEMPTION_ERRORS. When SyncReplicasOptimizer is used, a preemption in the Coordinator may result in two cases: Case 1) the session gets silently marked as complete Case 2) the session gets stuck This CL aims to solve and verify solutions for both of these problems. Fix 1 changes the should_stop logic. Fix 2 changes the CoordinatedSession.run() logic. SyncReplicasOptimizer runs a separate set of threads using a Coordinator instance. Those threads do FIFOQueue.enqueue; the main thread does a blocking FIFOQueue.dequeue. `sync_token_q` FIFOQueue is on parameter-servers. When one of the PS instances gets preempted, an AbortedError causes the Coordinator to stop via request_stop(ex). That by itself changes the state of MonitoredSession.should_stop() to True (Fix 1). Results of the blocking Dequeue operation are sent to the chief worker via Recv. What happens next depends on the amount of tokens in `sync_token_q`. If there are enough for the next call to Dequeue to return, then the low-level "tf session run() call" returns. The next iteration of the `while not MonitoredSession.should_stop()` loop decides that the training is complete (Case 1). If there are not enough tokens in `sync_token_q`, then the blocking Dequeue is going to keep waiting for them. This results in the graph execution getting stuck and the whole session getting garbage collected after 10 minutes (Case 2). We decided to fix that by re-creating a session after it gets garbage collected (Fix 2). An alternative was to try to cancel the pending Dequeue operation, but it's not clear that it is the right thing to do and it is also not easy. PiperOrigin-RevId: 164888390 --- Commit 46e4de6e5 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Undo loop fusion changes for now as they seem to be altering a few results. END_PUBLIC RELNOTES: n/a BEGIN_PUBLIC BEGIN_PUBLIC Automated g4 rollback of changelist 164825735 PiperOrigin-RevId: 165340331
* Make single-parameter constructors explicitGravatar A. Unique TensorFlower2017-05-31
| | | | PiperOrigin-RevId: 157628970
* Disable flaky timeout test.Gravatar A. Unique TensorFlower2017-05-25
| | | | PiperOrigin-RevId: 157178069
* Internal change.Gravatar A. Unique TensorFlower2017-05-17
| | | | PiperOrigin-RevId: 156349584
* Remove unnecessary copies of value parameters.Gravatar Peter Hawkins2017-05-10
| | | | PiperOrigin-RevId: 155511618
* Internal Change.Gravatar A. Unique TensorFlower2017-04-28
| | | | Change: 154605009
* Add nccl to tf.contrib.Gravatar A. Unique TensorFlower2017-03-31
| | | | | Compile nccl on windows, now that bazel 0.4.5 is used. Change: 151853954
* Change nccl_manager to use ncclCommInitAll.Gravatar A. Unique TensorFlower2017-02-21
| | | | Change: 148169806
* Merge changes from github.Gravatar Benoit Steiner2017-02-08
| | | | Change: 146918929
* Disable nccl_manager_test since it flakes a lot on Jenkins with errors findingGravatar A. Unique TensorFlower2017-02-07
| | | | | nvmlShutdown. Change: 146836247
* Add contrib/nccl for using all-reduce collectives across GPUs of a singleGravatar A. Unique TensorFlower2017-01-24
server. Change: 145475050