diff options
author | Shanqing Cai <cais@google.com> | 2017-09-25 19:35:53 -0700 |
---|---|---|
committer | TensorFlower Gardener <gardener@tensorflow.org> | 2017-09-25 19:39:42 -0700 |
commit | e2e3a943c0a28b7656325acb3fcd035743d55ea0 (patch) | |
tree | f4b909d5410bdf3b94012392909e7805cd27a2a7 /tensorflow/contrib/fused_conv | |
parent | df22044be98c8b707601e03fe22ded53bcc28c7e (diff) |
Merge changes from github.
END_PUBLIC
---
Commit 1e1b3d902 authored by Pete Warden<pete@petewarden.com>
Committed by gunan<gunan@google.com>:
Changed output directory for Pi CI build to fix permissions problem with nightlies (#13257)
* Fix for RTLD_GLOBAL breakage of Pi builds, and removed Eigen version change for Pi that's no longer needed
* Fixed Pi Zero OpenBLAS build problems and tidied up directories used
* More robust checks in Pi build script
* Changed output directory for Pi CI build to fix permissions problem
---
Commit fe3a2e65c authored by Yan Facai (???)<facai.yan@gmail.com>
Committed by drpngx<drpngx@users.noreply.github.com>:
check invalid string type for dest_nodes in extract_sub_graph (#13057)
* BUG: check str type
* TST: add unit test
* CLN: remove list check
* CLN: use warning
* CLN: 2 indent
* CLN: raise TypeError if not list
* CLN: check string only
---
Commit 225ab7629 authored by Jean Wanka<jm.wanka@gmail.com>
Committed by Jean Wanka<jm.wanka@gmail.com>:
Fix polynomial decay with cycle for global step=0
For polynomial decay with cycle=True the learning rate at
step 0 becomes NaN, because in the process of calculating it we
devide by 0. This change should fix it, by setting the multiplier
for the decay steps to one for global_step=0.
---
Commit 286f57061 authored by Bjarke Hammersholt Roune<broune@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Make Service::TransferToClient not attempt to manipulate the literal when the transfer failed, preventing a crash and allowing the caller to see the reason for the failed transfer.
PiperOrigin-RevId: 169770126
---
Commit e0501bc4d authored by Yong Tang<yong.tang.github@outlook.com>
Committed by Shanqing Cai<cais@google.com>:
Fix GRUBlockCell parameter naming inconsistency (#13153)
* Fix GRUBlockCell parameter naming inconsistency
This fix tries to fix the issue in 13137 where
parameter `cell_size` is used instead of `num_units`.
This is inconsistent with other RNN cells.
This fix adds support of `num_units` while at the same
time maintains backward compatiblility for `cell_size`.
This fix fixes 13137.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
* Add `@deprecated_args` for 'cell_size' in `GRUBlockCell`
This commit adds `@deprecated_args` for 'cell_size' in `GRUBlockCell`
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
* Address review comment
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
---
Commit 02a2eba05 authored by Pete Warden<pete@petewarden.com>
Committed by gunan<gunan@google.com>:
Fix for RTLD_GLOBAL breakage of Pi builds, and removed Eigen version change that's no longer needed (#13251)
* Fix for RTLD_GLOBAL breakage of Pi builds, and removed Eigen version change for Pi that's no longer needed
* Fixed Pi Zero OpenBLAS build problems and tidied up directories used
* More robust checks in Pi build script
---
Commit 8ef722253 authored by Sanjoy Das<sanjoy@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Remove a redundant setName.
The EmitComputation should have emitted a function with the right name, so use a
CHECK instead.
PiperOrigin-RevId: 169764856
---
Commit 1b94147dc authored by Neal Wu<wun@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Fix broken GitHub links in tensorflow and tensorflow_models resulting from The Great Models Move (a.k.a. the research subfolder)
PiperOrigin-RevId: 169763373
---
Commit b1ada5f0c authored by Justine Tunney<jart@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Fix TensorBoard python -m invoke in docs
PiperOrigin-RevId: 169758752
---
Commit 2957cd894 authored by Mustafa Ispir<ispir@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Local run option of estimator training.
PiperOrigin-RevId: 169756384
---
Commit 1dc2fe7ac authored by Gunhan Gulsoy<gunan@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
BEGIN_PUBLIC
Automated g4 rollback of changelist 166264198
PiperOrigin-RevId: 169998124
Diffstat (limited to 'tensorflow/contrib/fused_conv')
-rw-r--r-- | tensorflow/contrib/fused_conv/kernels/fused_conv2d_bias_activation_op.cc | 57 |
1 files changed, 31 insertions, 26 deletions
diff --git a/tensorflow/contrib/fused_conv/kernels/fused_conv2d_bias_activation_op.cc b/tensorflow/contrib/fused_conv/kernels/fused_conv2d_bias_activation_op.cc index 2d7407980f..9275d5a22b 100644 --- a/tensorflow/contrib/fused_conv/kernels/fused_conv2d_bias_activation_op.cc +++ b/tensorflow/contrib/fused_conv/kernels/fused_conv2d_bias_activation_op.cc @@ -493,37 +493,42 @@ void LaunchFusedConv2DBiasActivationOp<GPUDevice, T, BiasType, ScaleType>:: dnn::AlgorithmConfig algorithm_config; if (cudnn_use_autotune && !AutoTuneConvBiasActivation::GetInstance()->Find( fused_conv_parameters, &algorithm_config)) { - std::vector<dnn::AlgorithmType> algorithms; + std::vector<dnn::AlgorithmDesc::Index> algorithms; CHECK(stream->parent()->GetConvolveAlgorithms( fused_conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)); dnn::ProfileResult best_result; dnn::ProfileResult best_result_no_scratch; - for (auto profile_algorithm : algorithms) { - // TODO(zhengxq): profile each algorithm multiple times to better - // accuracy. - CudnnScratchAllocator scratch_allocator(ConvolveScratchSize, ctx); - dnn::ProfileResult profile_result; - bool cudnn_launch_status = - stream - ->ThenFusedConvolveWithAlgorithm( - conv_input_desc, conv_input_ptr, conv_input_scale, - filter_desc, filter_ptr, conv_desc, side_input_ptr, - side_input_scale, bias_desc, bias_ptr, - dnn::ActivationMode::kRelu, output_desc, &output_ptr, - &scratch_allocator, dnn::AlgorithmConfig(profile_algorithm), - &profile_result) - .ok(); - if (cudnn_launch_status) { - if (profile_result.is_valid()) { - if (profile_result.elapsed_time_in_ms() < - best_result.elapsed_time_in_ms()) { - best_result = profile_result; - } - if (scratch_allocator.TotalByteSize() == 0 && - profile_result.elapsed_time_in_ms() < - best_result_no_scratch.elapsed_time_in_ms()) { - best_result_no_scratch = profile_result; + // TODO(benbarsdell): Ideally this should not attempt using tensor op math + // if it's not enabled. + for (bool use_tensor_ops : {false, true}) { + for (auto algo_index : algorithms) { + // TODO(zhengxq): profile each algorithm multiple times to better + // accuracy. + dnn::AlgorithmDesc profile_algorithm(algo_index, use_tensor_ops); + CudnnScratchAllocator scratch_allocator(ConvolveScratchSize, ctx); + dnn::ProfileResult profile_result; + bool cudnn_launch_status = + stream + ->ThenFusedConvolveWithAlgorithm( + conv_input_desc, conv_input_ptr, conv_input_scale, + filter_desc, filter_ptr, conv_desc, side_input_ptr, + side_input_scale, bias_desc, bias_ptr, + dnn::ActivationMode::kRelu, output_desc, &output_ptr, + &scratch_allocator, dnn::AlgorithmConfig(profile_algorithm), + &profile_result) + .ok(); + if (cudnn_launch_status) { + if (profile_result.is_valid()) { + if (profile_result.elapsed_time_in_ms() < + best_result.elapsed_time_in_ms()) { + best_result = profile_result; + } + if (scratch_allocator.TotalByteSize() == 0 && + profile_result.elapsed_time_in_ms() < + best_result_no_scratch.elapsed_time_in_ms()) { + best_result_no_scratch = profile_result; + } } } } |