| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
| |
* Use cub::ReduceByKey to count how many times each partition index appears.
This implements a suggestion by @ekelsen. It replaces the
previously custom-made counting method and is likely more
efficient.
* Remove CubReduceAdd and use instead cub::Sum.
|
|
|
|
| |
PiperOrigin-RevId: 167333886
|
|
|
|
| |
PiperOrigin-RevId: 166396680
|
|
|
|
| |
PiperOrigin-RevId: 166294015
|
|
|
|
| |
PiperOrigin-RevId: 165887626
|
|
deterministic and mostly faster (some cases use the CUB library).
Speed for the types bool, std::complex<float>, std::complex<double>, Eigen::half
is anywhere from 50-1000x faster. For floats, speed for row and column
reductions increases by ~20-100%. For floats, performance on reductions
to a scalar is performance neutral. On P100 the float reduction to scalar is
slightly slower.
Mean reductions are more numerically stable - by doing the division after the
sum instead of before, we avoid changing mantissa bits which can introduce a
bias.
Accurate mean reductions for half (in addition to fast) - the
reduction is done internally in fp32 and cast just before the final write.
Also switch the l2loss op to use the new reductions as a demonstration of how
to use them in code. Replacement of other major users of Eigen's reductions
will follow to enable the most common training cases to be deterministic wrt
reductions.
PiperOrigin-RevId: 165773305
|