Commit message (Collapse) | Author | Age | ||
---|---|---|---|---|
... | ||||
| * | Alternative way of forcing instantiation of device kernels without | 2016-01-10 | ||
|/ | | | | | | causing warnings or requiring device to device kernel invocations. This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines. | |||
* | Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this ↵ | 2016-01-08 | ||
| | | | | reintroduces some compulation warnings but it's much better than having to deal with random assertion failures. | |||
* | Improved the performance of reductions on CUDA devices | 2016-01-04 | ||
| | ||||
* | Optimized the configuration of the outer reduction cuda kernel | 2015-12-22 | ||
| | ||||
* | Added missing define | 2015-12-22 | ||
| | ||||
* | Made sure the optimized gpu reduction code is actually compiled. | 2015-12-22 | ||
| | ||||
* | Optimized outer reduction on GPUs. | 2015-12-22 | ||
| | ||||
* | Doubled the speed of full reductions on GPUs. | 2015-12-18 | ||
| | ||||
* | Code cleanup | 2015-11-06 | ||