tensorflow - machine learning framework

	Commit message (Collapse)	Author	Age
*	Disable the cuDNN workarounds if the version number is new enough to get the ↵	Tim Shen	2018-10-02
\| \| \| \| \| \| \| \|	corresponding bugs fixed. The bugs that were work-arounded were fixed and verified. PiperOrigin-RevId: 215497418
*	Merge pull request #21958 from MattConley:CudaOccupancy	TensorFlower Gardener	2018-10-01
\|\ \| \| \| \| \| \|	PiperOrigin-RevId: 215331087
* \|	Move winograd algorithm workaround to stream executor.	Tim Shen	2018-09-21
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 214075796
* \|	[SE] Restore int8x4 data types if that's the requested DataLayout for fused conv	Benjamin Kramer	2018-09-18
\| \| \| \| \| \| \| \| \| \| \| \|	This broke in a recent refactoring. PiperOrigin-RevId: 213497416
* \|	Fix and complete StreamExecutor's DoFusedConvolve:	Tim Shen	2018-09-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* bias_nd is set to have CUDNN_DATA_FLOAT, even though BiasType is not float. * double is supported but not exposed through the public interface. * DoFusedConvolveImpl has duplicated information in its template parameter list. PiperOrigin-RevId: 213308435
* \|	Zero out the result buffer for strided conv backward filter for NHWC layouts.	Tim Shen	2018-09-06
\| \| \| \| \| \| \| \| \| \| \| \|	cuDNN 7.1.4 and 7.2 has non-determinisic bug if the buffer is not zeroed. PiperOrigin-RevId: 211905127
\| *	Fully fixed clang errors	Matt Conley	2018-09-06
\| \|
\| *	Fixed clang formatting	Matt Conley	2018-09-06
\| \|
\| *	Recommended typo fix	Matt Conley	2018-09-04
\| \|
\| *	Fixed transition typo	Matt Conley	2018-09-04
\| \|
\| *	Move CUDA-specific occupancy calculation into proper file	Matt Conley	2018-09-04
\| \| \| \| \| \| \| \| \| \|	-Maintain functionality, just move CalculateOccupancy() and CompareOccupancy() methods from device_description to cuda_gpu_executor -Remove CUDA requirement in general class device_description
* \|	Remove (Mutable)ArraySlice implementation and alias them to absl::Span.	Tim Shen	2018-08-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several API migrations happening: * ArraySlice's sub-slice constructor => .subspan * MutableArraySlice's container pointer constructor => absl::MakeSpan PiperOrigin-RevId: 210946124
\| *	Update GPU occupancy checking to utilize CUDA's occupancy calculator functions	Matt Conley	2018-08-28
\|/ \| \| \| \| \|	-Replace references to the UnqueryableDeviceParams struct with calls to CUDA's built-in occupancy calculation functions -Update calls to the occupancy checking functions with the new changes -Changes should provide more long-term reliability and will remove the need to manually update hardcoded data values for new GPU architectures
*	Replaced calls to tensorflow::StringPiece::ToString with string conversions.	A. Unique TensorFlower	2018-08-22
\| \| \| \| \| \| \| \|	That is, instances of sp.ToString() are replaced with string(sp). This will allow tensorflow::StringPiece::ToString to be removed, which is necessary before it can be replaced with absl::string_view. PiperOrigin-RevId: 209806694
*	Merge pull request #20708 from ↵	TensorFlower Gardener	2018-08-07
\|\ \| \| \| \| \| \| \| \| \| \|	ROCmSoftwarePlatform:upstream-staging-stream-executor-algorithmconfig-profileresult PiperOrigin-RevId: 207801599
* \|	[XLA:GPU] Add a fast version of gemmStridedBatched for cuda 9.1	Benjamin Kramer	2018-08-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's unfortunate that this was only added in 9.1, but I haven't found a good way of emulating the behavior on 9.0 without falling back to non-batched gemms. PiperOrigin-RevId: 207286575
* \|	[XLA:GPU] Use strided batched gemm instead of building pointer tables.	Benjamin Kramer	2018-08-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is mostly a huge amount of plumbing just to call into the cublas functions. blasGemmStridedBatched has been available since CUDA 8.0. For autotuning we'd need cublasGemmStridedBatchedEx, which is new in CUDA 9.2 so I didn't wire that up yet. PiperOrigin-RevId: 207285707
\| *	Add scratch memory size in AlgorithmDesc	Wen-Heng (Jack) Chung	2018-08-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add one field, scratch_size_, into AlgorithmDesc. The field would be set by DNN libraries during algorithm finding / profiling stage. For algorithms not using scratch memory the field would be zero. Change CUDA StreamExecutor implementation to set this field properly.
* \|	[SE] Allow context reuse in CreatedContexts::Add.	Justin Lebar	2018-08-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's possible for an already-existing context to be returned by cuDevicePrimaryCtxRetain. Previously, this would be handled incorrectly by CreatedContexts::Add, which was assuming that inserts into the map always succeeded. This makes XLA work with TF_CUDA_PLATFORM_GPU_DEVICE_SCHEDULE=blocking_sync, although exactly how that flag is related to this bug is unclear to me. It seems like some sort of race condition, maybe? PiperOrigin-RevId: 207010059
* \|	[SE] Add an nvbugs link.	Justin Lebar	2018-08-01
\| \| \| \| \| \| \| \| \| \| \| \|	Comment-only change PiperOrigin-RevId: 206957994
* \|	[SE] Add additional log statements to DoBlasGemmWithAlgorithmImpl.	Justin Lebar	2018-07-31
\| \| \| \| \| \| \| \| \| \| \| \|	This makes it easier to see why this function fails. PiperOrigin-RevId: 206856975
* \|	[SE] Add new cublas algorithms from CUDA 9.2.	Justin Lebar	2018-07-31
\| \| \| \| \| \| \| \| \| \| \| \|	I verified that CUDA 9.1 did not introduce any new algorithms. PiperOrigin-RevId: 206850523
* \|	[SE] Add missing cublas algorithms for cuda 9.0, ↵	Justin Lebar	2018-07-31
\|/ \| \| \| \| \| \| \|	CUBLAS_GEMM_ALGO{3,4}_TENSOR_OP. These appear to have been omitted by mistake. PiperOrigin-RevId: 206843312
*	Set the correct context when calling cudnnCreate.	A. Unique TensorFlower	2018-07-26
\| \| \| \| \| \| \| \| \| \|	When running with multiple devices, using the wrong context will lead to a check-fail when trying to set a stream that has been created with a different context. This resolves a check-fail on resnet50 with 8 GPUs. PiperOrigin-RevId: 206274741
*	[SE] Try again to query the GPU driver for error descriptions	Benjamin Kramer	2018-07-26
\| \| \| \| \| \| \| \| \| \|	This code hs been here since 2014, now the oldest supported version of CUDA is 8 so cuGetErrorName should always be available. Also the list of errors is (of course) out of sync with upstream CUDA. Also surface the description of the error to the user, if available. PiperOrigin-RevId: 206191424
*	Teach StreamExecutor to load modules and resolve symbols in them	Sanjoy Das	2018-07-23
\| \| \| \| \| \|	This will be used in a future CL. PiperOrigin-RevId: 205742731
*	Automated rollback of commit 36a66347e8e344cddee4a8d9123ccbcae40011b1	A. Unique TensorFlower	2018-07-18
\| \| \| \|	PiperOrigin-RevId: 205164273
*	Merge pull request #20675 from ↵	TensorFlower Gardener	2018-07-18
\|\ \| \| \| \| \| \| \| \| \| \|	ROCmSoftwarePlatform:upstream-staging-stream-executor PiperOrigin-RevId: 205140328
* \|	Support identity activation function in Cudnn implementation of fused conv2d ↵	A. Unique TensorFlower	2018-07-17
\| \| \| \| \| \| \| \| \| \| \| \|	bias activation. PiperOrigin-RevId: 205008958
* \|	Error on some documented invalid Cudnn inputs. Cudnn should have	A. Unique TensorFlower	2018-07-17
\| \| \| \| \| \| \| \| \| \| \| \|	returned errors, but crashes instead. PiperOrigin-RevId: 205000883
* \|	Merge pull request #20706 from ↵	TensorFlower Gardener	2018-07-16
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ROCmSoftwarePlatform:upstream-staging-stream-executor-pooling-interface PiperOrigin-RevId: 204805678
\| \| *	[ROCm] Interface changes for StreamExecutor to support both CUDA and ROCm	Wen-Heng (Jack) Chung	2018-07-12
\| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1) StreamInterface::CudaStreamMemberHack() Despite the fact that StreamExecutor and GPU common runtime are largely orthogonal, this particular routine in StreamExecutor is used in GPU common runtime and a couple of other operators. In this commit it's renamed as StreamInterface::GpuStreamMemberHack() and their call sites are also changed. 2) StreamExecutorInterface::CudaContextHack() This member is renamed to StramExecutorInterface::GpuContextHack(). Changes introduced in this commit includes: - some StreamExecutor interfaces and CUDA implementation - GPU common runtime related to interface changes in StreamExecutor - operators affected by interface changes in StreamExecutor
\| *	[ROCm] Interface changes for pooling APIs in StreamExecutor	Wen-Heng (Jack) Chung	2018-07-11
\|/ \| \| \| \| \|	Due to the design of MIOpen, the DNN library on ROCm platform, an instance of ScratchAllocator has to be passed into pooling routines. This commit address such interface changes and the implementation in CUDA StreamExecutor.
*	Workaround the cudnn 7.1.4 correctness bug, where the workspace is required ↵	A. Unique TensorFlower	2018-07-02
\| \| \| \| \| \|	to be zeroed. PiperOrigin-RevId: 203001311
*	Improve filter for cuBLAS bug.	A. Unique TensorFlower	2018-06-19
\| \| \| \|	PiperOrigin-RevId: 201239428
*	Rollback of changelist 200200356. We might want to support GPUs on MacOS ↵	A. Unique TensorFlower	2018-06-19
\| \| \| \| \| \|	again in the future. Users are interested to make it work and we don't want to be in the way. PiperOrigin-RevId: 201214857
*	Fix a build failure when cuda version is less than 9000.	A. Unique TensorFlower	2018-06-13
\| \| \| \|	PiperOrigin-RevId: 200432478
*	Detect configurations that would be hitting a bug in cuBLAS and report an error.	A. Unique TensorFlower	2018-06-13
\| \| \| \|	PiperOrigin-RevId: 200411493
*	Remove OS X code from CUDA stream executor because that platform is no ↵	A. Unique TensorFlower	2018-06-12
\| \| \| \| \| \|	longer supported. PiperOrigin-RevId: 200200356
*	Unify cuDNN descriptor wrapper names.	A. Unique TensorFlower	2018-06-12
\| \| \| \| \| \|	No functional changes. PiperOrigin-RevId: 200199956
*	Detect configurations that would be hitting bugs in cuDNN and report an error.	A. Unique TensorFlower	2018-06-08
\| \| \| \|	PiperOrigin-RevId: 199780350
*	Do not enable tensor ops for cuDNN RNN unless explicitly specified.	A. Unique TensorFlower	2018-06-05
\| \| \| \|	PiperOrigin-RevId: 199321021
*	Unify error handling in CudnnSupport.	A. Unique TensorFlower	2018-06-01
\| \| \| \|	PiperOrigin-RevId: 198836479
*	Fix GPU build on windows	Smit Hinsu	2018-05-29
\| \| \| \|	PiperOrigin-RevId: 198513480
*	Merge changes from github.	Yifei Feng	2018-05-24
\| \| \| \| \| \| \|	Revert #18413. Too many internal test failures due to the name scope change caused by this change. Revert #18192. Cannot use re2::StringPiece internally. Need alternative for set call. Will pull and clean this up in a separate change. PiperOrigin-RevId: 197991247
*	Add convolution with NHWC layout to stream executor.	A. Unique TensorFlower	2018-05-22
\| \| \| \|	PiperOrigin-RevId: 197650067
*	Introduce an option to allocate CUDA unified memory	Smit Hinsu	2018-05-21
\| \| \| \|	PiperOrigin-RevId: 197490523
*	Rollforward of CL 197167501, without enabling CUDNN_FFT_TILING_FORWARD ↵	A. Unique TensorFlower	2018-05-20
\| \| \| \| \| \|	because that breaks XLA tests. PiperOrigin-RevId: 197328103
*	Automated g4 rollback of changelist 197118212	A. Unique TensorFlower	2018-05-18
\| \| \| \|	PiperOrigin-RevId: 197167501
*	Dropping support for CUDA < 8.	A. Unique TensorFlower	2018-05-18
\| \| \| \|	PiperOrigin-RevId: 197137612