| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) StreamInterface::CudaStreamMemberHack()
Despite the fact that StreamExecutor and GPU common runtime are largely
orthogonal, this particular routine in StreamExecutor is used in GPU common
runtime and a couple of other operators. In this commit it's renamed as
StreamInterface::GpuStreamMemberHack() and their call sites are also changed.
2) StreamExecutorInterface::CudaContextHack()
This member is renamed to StramExecutorInterface::GpuContextHack().
Changes introduced in this commit includes:
- some StreamExecutor interfaces and CUDA implementation
- GPU common runtime related to interface changes in StreamExecutor
- operators affected by interface changes in StreamExecutor
|
|
|
|
|
|
|
|
|
|
| |
the number of work_elements was too small, which could return a block_count that is too small to cover all elements.
We also have been ignoring the suggested thread_per_block, so were potentially launching more blocks than necessary to fill the GPU (which is inefficient, but functionally correct).
Changing 'assert(false && ...' to LOG(FATAL) because it shouldn't be debug only.
PiperOrigin-RevId: 186037306
|
|
|
|
|
|
| |
intrinsics.
PiperOrigin-RevId: 183374082
|
|
|
|
| |
PiperOrigin-RevId: 179861781
|
|
|
|
|
|
| |
intrinsics.
PiperOrigin-RevId: 179782067
|
|
|
|
| |
PiperOrigin-RevId: 177989542
|
|
PiperOrigin-RevId: 177799252
|