tensorflow - machine learning framework

	Commit message (Collapse)	Author	Age
*	Disable the cuDNN workarounds if the version number is new enough to get the ↵	Tim Shen	2018-10-02
\| \| \| \| \| \| \| \|	corresponding bugs fixed. The bugs that were work-arounded were fixed and verified. PiperOrigin-RevId: 215497418
*	Merge pull request #21958 from MattConley:CudaOccupancy	TensorFlower Gardener	2018-10-01
\|\ \| \| \| \| \| \|	PiperOrigin-RevId: 215331087
* \|	Add cuDNN fused convolution forward support.	Tim Shen	2018-09-24
\| \| \| \| \| \| \| \| \| \| \| \|	The tests are in the next patch. PiperOrigin-RevId: 214362688
* \|	Move winograd algorithm workaround to stream executor.	Tim Shen	2018-09-21
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 214075796
* \|	[SE] Use absl instead of TF classes where an absl version exists	Benjamin Kramer	2018-09-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the exception of StrCat all of these are using absl already, this change just removes one layer of indirection. PiperOrigin-RevId: 213846036
* \|	Added ABSL_DEPRECATED annotations to various deprecated TensorFlow functions.	A. Unique TensorFlower	2018-09-19
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 213693027
* \|	[SE] Restore int8x4 data types if that's the requested DataLayout for fused conv	Benjamin Kramer	2018-09-18
\| \| \| \| \| \| \| \| \| \| \| \|	This broke in a recent refactoring. PiperOrigin-RevId: 213497416
* \|	Fix and complete StreamExecutor's DoFusedConvolve:	Tim Shen	2018-09-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* bias_nd is set to have CUDNN_DATA_FLOAT, even though BiasType is not float. * double is supported but not exposed through the public interface. * DoFusedConvolveImpl has duplicated information in its template parameter list. PiperOrigin-RevId: 213308435
* \|	Internal change.	Anna R	2018-09-12
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 212684548
* \|	Zero out the result buffer for strided conv backward filter for NHWC layouts.	Tim Shen	2018-09-06
\| \| \| \| \| \| \| \| \| \| \| \|	cuDNN 7.1.4 and 7.2 has non-determinisic bug if the buffer is not zeroed. PiperOrigin-RevId: 211905127
\| *	Fully fixed clang errors	Matt Conley	2018-09-06
\| \|
\| *	Fixed clang formatting	Matt Conley	2018-09-06
\| \|
* \|	Alias tensorflow::gtl::InlinedVector to absl::InlinedVector	Benjamin Kramer	2018-09-05
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 211639440
\| *	Recommended typo fix	Matt Conley	2018-09-04
\| \|
\| *	Fixed transition typo	Matt Conley	2018-09-04
\| \|
\| *	Move CUDA-specific occupancy calculation into proper file	Matt Conley	2018-09-04
\| \| \| \| \| \| \| \| \| \|	-Maintain functionality, just move CalculateOccupancy() and CompareOccupancy() methods from device_description to cuda_gpu_executor -Remove CUDA requirement in general class device_description
* \|	Remove (Mutable)ArraySlice implementation and alias them to absl::Span.	Tim Shen	2018-08-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several API migrations happening: * ArraySlice's sub-slice constructor => .subspan * MutableArraySlice's container pointer constructor => absl::MakeSpan PiperOrigin-RevId: 210946124
\| *	Update GPU occupancy checking to utilize CUDA's occupancy calculator functions	Matt Conley	2018-08-28
\|/ \| \| \| \| \|	-Replace references to the UnqueryableDeviceParams struct with calls to CUDA's built-in occupancy calculation functions -Update calls to the occupancy checking functions with the new changes -Changes should provide more long-term reliability and will remove the need to manually update hardcoded data values for new GPU architectures
*	Removed redundant std::string -> string conversions.	A. Unique TensorFlower	2018-08-28
\| \| \| \|	PiperOrigin-RevId: 210596417
*	Removed ToString method from tensorflow::StringPiece.	A. Unique TensorFlower	2018-08-28
\| \| \| \| \| \|	This will make it easier to replace tensorflow::StringPiece with absl::string_view, as absl::string_view does not contain a ToString method. PiperOrigin-RevId: 210550029
*	Removed redundant std::string -> string conversions.	A. Unique TensorFlower	2018-08-24
\| \| \| \|	PiperOrigin-RevId: 210127626
*	[SE] Avoid deadlock by calling HostCallbacks even when the stream is in an ↵	A. Unique TensorFlower	2018-08-22
\| \| \| \| \| \| \| \|	error state HostCallbacks may trigger notifications that, if elided, would cause programs to hang. Ideally we would have errback semantics, but this is a band-aid while the semantics are redefined. PiperOrigin-RevId: 209818770
*	Replaced calls to tensorflow::StringPiece::ToString with string conversions.	A. Unique TensorFlower	2018-08-22
\| \| \| \| \| \| \| \|	That is, instances of sp.ToString() are replaced with string(sp). This will allow tensorflow::StringPiece::ToString to be removed, which is necessary before it can be replaced with absl::string_view. PiperOrigin-RevId: 209806694
*	[SE] Don't CHECK-fail when the stream is not-OK	A. Unique TensorFlower	2018-08-22
\| \| \| \| \| \|	This check-fail was wrong anyway; it meant to check the substream's status, but checked its own anyway. We could be in an error state and that's absolutely fine, we shouldn't kill the process for this. PiperOrigin-RevId: 209721359
*	fix C++ header guards.	A. Unique TensorFlower	2018-08-21
\| \| \| \|	PiperOrigin-RevId: 209679086
*	Merge pull request #20536 from rongjiecomputer:flag	TensorFlower Gardener	2018-08-13
\|\ \| \| \| \| \| \|	PiperOrigin-RevId: 208565050
* \|	Destroy the task before unblocking its waiters.	Tim Shen	2018-08-13
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 208508212
* \|	Automated rollback of commit 56e4ea405d13125a3dcb6459019a83d12330bf84	Peter Hawkins	2018-08-13
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 208505669
* \|	Automated rollback of commit b306f5f9458feddbdb89b7db557cb74dc9408d07	Peter Hawkins	2018-08-10
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 208200028
* \|	[TF:XLA] Add a real implementation of XlaDevice::Sync() so Session::Run() ↵	Peter Hawkins	2018-08-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	will correctly wait for all computations to complete on an XLA device before termination. [TF:XLA] Change the XlaTensor definition event to be a shared pointer to an stream_executor::Event. This allows many tensors to share the same definition event. PiperOrigin-RevId: 208128264
* \|	Merge pull request #21232 from ghostplant:fix-typo	TensorFlower Gardener	2018-08-08
\|\ \ \| \| \| \| \| \| \| \| \|	PiperOrigin-RevId: 207983992
* \ \	Merge pull request #20708 from ↵	TensorFlower Gardener	2018-08-07
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ROCmSoftwarePlatform:upstream-staging-stream-executor-algorithmconfig-profileresult PiperOrigin-RevId: 207801599
* \| \| \|	Implement DoHostCallbackWithStatus to allow callbacks to return a status	A. Unique TensorFlower	2018-08-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PiperOrigin-RevId: 207714420
* \| \| \|	Drop failed sub-streams during both Get and Return.	Todd Wang	2018-08-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The old code ensured that failed sub-streams would not be re-used, but had two flaws: 1) It only checked for failed sub-streams during Return. 2) It didn't actually remove the failed sub-streams from our state. The new code fixes these two flaws, and adds an extra test that explains why (1) is insufficient. PiperOrigin-RevId: 207333296
* \| \| \|	[XLA:GPU] Add a fast version of gemmStridedBatched for cuda 9.1	Benjamin Kramer	2018-08-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's unfortunate that this was only added in 9.1, but I haven't found a good way of emulating the behavior on 9.0 without falling back to non-batched gemms. PiperOrigin-RevId: 207286575
* \| \| \|	[XLA:GPU] Use strided batched gemm instead of building pointer tables.	Benjamin Kramer	2018-08-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is mostly a huge amount of plumbing just to call into the cublas functions. blasGemmStridedBatched has been available since CUDA 8.0. For autotuning we'd need cublasGemmStridedBatchedEx, which is new in CUDA 9.2 so I didn't wire that up yet. PiperOrigin-RevId: 207285707
\| * \| \|	Add scratch memory size in AlgorithmDesc	Wen-Heng (Jack) Chung	2018-08-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add one field, scratch_size_, into AlgorithmDesc. The field would be set by DNN libraries during algorithm finding / profiling stage. For algorithms not using scratch memory the field would be zero. Change CUDA StreamExecutor implementation to set this field properly.
* \| \| \|	[SE] Allow context reuse in CreatedContexts::Add.	Justin Lebar	2018-08-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's possible for an already-existing context to be returned by cuDevicePrimaryCtxRetain. Previously, this would be handled incorrectly by CreatedContexts::Add, which was assuming that inserts into the map always succeeded. This makes XLA work with TF_CUDA_PLATFORM_GPU_DEVICE_SCHEDULE=blocking_sync, although exactly how that flag is related to this bug is unclear to me. It seems like some sort of race condition, maybe? PiperOrigin-RevId: 207010059
* \| \| \|	[SE] Add an nvbugs link.	Justin Lebar	2018-08-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Comment-only change PiperOrigin-RevId: 206957994
* \| \| \|	[SE] Add additional log statements to DoBlasGemmWithAlgorithmImpl.	Justin Lebar	2018-07-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This makes it easier to see why this function fails. PiperOrigin-RevId: 206856975
* \| \| \|	[SE] Add new cublas algorithms from CUDA 9.2.	Justin Lebar	2018-07-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I verified that CUDA 9.1 did not introduce any new algorithms. PiperOrigin-RevId: 206850523
* \| \| \|	[SE] Add missing cublas algorithms for cuda 9.0, ↵	Justin Lebar	2018-07-31
\|/ / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CUBLAS_GEMM_ALGO{3,4}_TENSOR_OP. These appear to have been omitted by mistake. PiperOrigin-RevId: 206843312
* \| \|	[SE] Ensure we BlockHostUntilDone before we deallocate temporary memory	A. Unique TensorFlower	2018-07-30
\| \| \| \| \| \| \| \| \| \| \| \|	PiperOrigin-RevId: 206595861
\| * \|	Fix typo: host_src -> gpu_src for inter-gpu copy	CUI Wei	2018-07-29
\|/ / \| \| \| \| \| \|	Signed-off-by: CUI Wei <ghostplant@qq.com>
* \|	[XLA:GPU] Only add the cubin if it is available	Benjamin Kramer	2018-07-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's only non-empty if we were able to run ptxas. If the PTX is going to be JIT'ed by the driver it won't be around. Loading an empty cubin will result in a fatal error. PiperOrigin-RevId: 206341931
* \|	Set the correct context when calling cudnnCreate.	A. Unique TensorFlower	2018-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When running with multiple devices, using the wrong context will lead to a check-fail when trying to set a stream that has been created with a different context. This resolves a check-fail on resnet50 with 8 GPUs. PiperOrigin-RevId: 206274741
* \|	[SE] Try again to query the GPU driver for error descriptions	Benjamin Kramer	2018-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This code hs been here since 2014, now the oldest supported version of CUDA is 8 so cuGetErrorName should always be available. Also the list of errors is (of course) out of sync with upstream CUDA. Also surface the description of the error to the user, if available. PiperOrigin-RevId: 206191424
* \|	Ensure failed sub-streams are not re-used.	Todd Wang	2018-07-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Streams have a monotonic state machine; if a stream encounters any error, it will remain in an error state forever. Without this change, a previously failed sub-stream will be put back on sub_streams_, only to cause the next usage of the sub-stream to trivially fail. PiperOrigin-RevId: 206112024
* \|	Automated rollback of commit 0ea6847c892497afdd20c1150fee1e532612ca17	A. Unique TensorFlower	2018-07-24
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 205885304
* \|	Teach StreamExecutor to load modules and resolve symbols in them	Sanjoy Das	2018-07-23
\| \| \| \| \| \| \| \| \| \| \| \|	This will be used in a future CL. PiperOrigin-RevId: 205742731