aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/stream_executor/stream_executor_internal.h
Commit message (Collapse)AuthorAge
* Merge pull request #21232 from ghostplant:fix-typoGravatar TensorFlower Gardener2018-08-08
|\ | | | | | | PiperOrigin-RevId: 207983992
* | Implement DoHostCallbackWithStatus to allow callbacks to return a statusGravatar A. Unique TensorFlower2018-08-07
| | | | | | | | PiperOrigin-RevId: 207714420
| * Fix typo: host_src -> gpu_src for inter-gpu copyGravatar CUI Wei2018-07-29
|/ | | | Signed-off-by: CUI Wei <ghostplant@qq.com>
* Teach StreamExecutor to load modules and resolve symbols in themGravatar Sanjoy Das2018-07-23
| | | | | | This will be used in a future CL. PiperOrigin-RevId: 205742731
* [ROCm] Interface changes for StreamExecutor to support both CUDA and ROCmGravatar Wen-Heng (Jack) Chung2018-07-12
| | | | | | | | | | | | | | | | | | | 1) StreamInterface::CudaStreamMemberHack() Despite the fact that StreamExecutor and GPU common runtime are largely orthogonal, this particular routine in StreamExecutor is used in GPU common runtime and a couple of other operators. In this commit it's renamed as StreamInterface::GpuStreamMemberHack() and their call sites are also changed. 2) StreamExecutorInterface::CudaContextHack() This member is renamed to StramExecutorInterface::GpuContextHack(). Changes introduced in this commit includes: - some StreamExecutor interfaces and CUDA implementation - GPU common runtime related to interface changes in StreamExecutor - operators affected by interface changes in StreamExecutor
* Introduce an option to allocate CUDA unified memoryGravatar Smit Hinsu2018-05-21
| | | | PiperOrigin-RevId: 197490523
* [StreamExecutor] Rename ::perftools::gputools -> ::stream_executor, part 1.Gravatar Justin Lebar2018-04-17
| | | | | | | | | | | | | | | | | | | | | | | | | | Step 1 of re-namespace'ing StreamExecutor into ::stream_executor. This moves everything inside of stream_executor/..., and leaves a namespace alias into ::perftools::gputools. The next steps will clean up users to use the new namespace. This is mostly a mechanical change, but it also includes a bunch of non-mechanical changes that ideally would be split out into separate patches. Unfortunately they all sort of need to be shoved in here for various reasons: - forward declarations need to be in the same namespace as the actual types, so we need to change all forward declarations of StreamExecutor types in this one patch. - Uses of these forward declarations need to be changed to the new namespace (or otherwise we need to add a namespace alias to the relevant header, but this is pretty ugly). - Various initialization code needs to live in StreamExecutor's "real" namespace, so all this needs to be changed. PiperOrigin-RevId: 193256128
* Rename StreamExecutorInterface::BlockHostUntilDoneWithStatus to ↵Gravatar A. Unique TensorFlower2017-12-15
| | | | | | BlockHostUntilDone. PiperOrigin-RevId: 179198370
* Replace StreamExecutorInterface::BlockHostUntilDone with ↵Gravatar A. Unique TensorFlower2017-12-09
| | | | | | | | | BlockHostUntilDoneWithStatus All known overrides of StreamExecutorInterface::BlockHostUntilDone are changed by this CL. PiperOrigin-RevId: 178492517
* Add BlockHostUntilDoneWithStatus, which returns Status rather than bool.Gravatar A. Unique TensorFlower2017-12-06
| | | | | | | | | Also fixed a deadlock in Stream::BlockHostUntilDone. The problem with the original code was that it grabbed mu_ before looping over substreams, and would call CheckError with mu_ still held. But CheckError will attempt to lock mu_ in the failure case, which would deadlock. PiperOrigin-RevId: 178191634
* Clean up kernels cached by CUDAExecutor.Gravatar Artem Belevich2017-11-07
| | | | | | | Notify CUDA executor on deallocation of previously loaded GPUExecutable and unload corresponding module when all executables that were using it are gone. PiperOrigin-RevId: 174908193
* Allow tensorflow devices to report their load. This may be used to improve ↵Gravatar A. Unique TensorFlower2017-10-10
| | | | | | batch scheduling. PiperOrigin-RevId: 171716813
* Miscellaneous cleanupsGravatar A. Unique TensorFlower2017-06-05
| | | | PiperOrigin-RevId: 158012131
* Merge changes from github.Gravatar Shanqing Cai2017-04-22
| | | | Change: 153925676
* Default impl for StreamExecutorInterface::MemsetGravatar A. Unique TensorFlower2017-01-27
| | | | | This makes it easier to create StreamExecutors for other platforms. Change: 145821039
* Plumb port::Status through the internal synchronous memcopy routines.Gravatar A. Unique TensorFlower2017-01-17
| | | | | | | Now, at least for the public APIs that return port::Status, they can grab the port::Status that the implementation would like to return and use its additional information in reporting to the user. Change: 144741667
* StreamExecutor: Optimize kernel argument packingGravatar Peter Hawkins2016-11-29
| | | | | Create a single class to hold all kernel arguments and optimize how they are added into this class. Change: 140556725
* Update copyright for 3p/tf.Gravatar A. Unique TensorFlower2016-06-02
| | | | Change: 123901292
* fp16 support for BiasAdd. Includes support for atomic adds for Eigen::half,Gravatar A. Unique TensorFlower2016-05-10
| | | | | | | | | | | | | | | although beware, they are going to be very slow. Also: - Remove the testGradientBias() test, since it was a fp64-only duplication of tests we already had in other subtests. - Extend gradient microbenchmarks to measure NCHW and NHWC, not just the default in a two-dimensional tensor. - Fix cuda_builtin::__ldg() definition; seemingly calling ::__ldg() on gcudacc returns only zero, not the __ldg() function we defined a few lines further up. Change: 121970776
* TensorFlow: upstream changes to git.Gravatar Vijay Vasudevan2015-12-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change 109695551 Update FAQ Change 109694725 Add a gradient for resize_bilinear op. Change 109694505 Don't mention variables module in docs variables.Variable should be tf.Variable. Change 109658848 Adding an option to create a new thread-pool for each session. Change 109640570 Take the snapshot of stream-executor. + Expose an interface for scratch space allocation in the interface. Change 109638559 Let image_summary accept uint8 input This allows users to do their own normalization / scaling if the default (very weird) behavior of image_summary is undesired. This required a slight tweak to fake_input.cc to make polymorphically typed fake inputs infer if their type attr is not set but has a default. Unfortunately, adding a second valid type to image_summary *disables* automatic implicit conversion from np.float64 to tf.float32, so this change is slightly backwards incompatible. Change 109636969 Add serialization operations for SparseTensor. Change 109636644 Update generated Op docs. Change 109634899 TensorFlow: add a markdown file for producing release notes for our releases. Seed with 0.5.0 with a boring but accurate description. Change 109634502 Let histogram_summary take any realnumbertype It used to take only floats, not it understands ints. Change 109634434 TensorFlow: update locations where we mention python 3 support, update them to current truth. Change 109632108 Move HSV <> RGB conversions, grayscale conversions, and adjust_* ops back to tensorflow - make GPU-capable version of RGBToHSV and HSVToRGB, allows only float input/output - change docs to reflect new size constraints - change HSV format to be [0,1] for all components - add automatic dtype conversion for all adjust_* and grayscale conversion ops - fix up docs Change 109631077 Improve optimizer exceptions 1. grads_and_vars is now a tuple, so must be wrapped when passed to format. 2. Use '%r' instead of '%s' for dtype formatting Base CL: 109697989
* TensorFlow: Improve performance of AlexnetGravatar Manjunath Kudlur2015-11-20
| | | | | | | | | | | | | | | | | | | | | | Changes: * error message that refers to removed `DefaultSession` method. * -Wnull-conversion warnings * the "_start_time" attr for recvs when the flag "--brain_enable_scheduling_for_recvs" is set. * typo in tutorial data download progress message. * a typo ("however their installing"=>"however installing"). * typo, rename "TensorFlow Mechanics" to "How To" to be consistent with the website. * a typo ("subtact"=>"subtract"). * protobuf examples in comments in tensorflow::Example.proto. * formula formatting in MNIST beginner tutorial * negative fraction-of-queue-full stats * protobuf inclusion path so that Android demo will build under Blaze. * small typo (moderatly > moderately) * Session.run() to check that tensor arguments come from the session's graph. * another six import * seq2seq typo in bazel command Base CL: 108349164
* TensorFlow: Initial commit of TensorFlow library.Gravatar Manjunath Kudlur2015-11-06
TensorFlow is an open source software library for numerical computation using data flow graphs. Base CL: 107276108