| Commit message (Collapse) | Author | Age |
|
|
|
| |
PiperOrigin-RevId: 213684048
|
|
|
|
|
|
| |
This is a starting point for Dataset-compatible weighted sampling across a list of datasets.
PiperOrigin-RevId: 192540412
|
|
|
|
| |
PiperOrigin-RevId: 177444775
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
avoiding unnecessary casts.
Benchmark with AVX+FMA enabled:
Run on <redacted> (12 X 3492 MHz CPUs); 2017-06-05T12:54:07.881672447-07:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_Multinomial_cpu_1_10000_4 250817 172953 +31.0%
BM_Multinomial_cpu_1_10000_128 273834 187552 +31.5%
BM_Multinomial_cpu_1_10000_10000 1174175 1130778 +3.7%
BM_Multinomial_cpu_1_100000_4 2040741 1276761 +37.4%
BM_Multinomial_cpu_32_10000_4 10221765 4498666 +56.0%
BM_Multinomial_cpu_32_10000_128 10638159 4994754 +53.0%
BM_Multinomial_cpu_32_100000_4 100790019 44193314 +56.2%
BM_Multinomial_cpu_128_100000_1 431269640 182506078 +57.7%
PiperOrigin-RevId: 158061480
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gumbel noise. -log(-log(U)) gives infinity if U draws a hard 0. Adds a tiny
offset to U (2e-30) to avoid log(U) = -inf.
The CPU sampling algorithm depends on the order of the logits which is
undesirable and can also oversample the first logit if it is smaller than the
smallest random float larger than 0 (~1e-7). Switching to double precision
internally mitigates these problems, although it doesn't fix them. Slowdown
is ~35% in the worst case.
Also adds various tests that we would like the sampling to pass.
CPU Benchmark before:
32 10000 1 0.060 0.069 0.87
32 10000 4 0.229 0.074 3.10
32 10000 32 2.180 0.059 37.09
32 100000 1 0.430 0.480 0.90
32 100000 4 2.322 0.449 5.17
32 100000 32 31.508 0.471 66.96
128 10000 1 0.168 0.235 0.71
128 10000 4 0.965 0.246 3.93
128 10000 32 7.989 0.225 35.51
128 100000 1 1.681 1.539 1.09
128 100000 4 9.012 1.57 35.73
128 100000 32 126.222 1.626 77.60
CPU Benchmark after:
32 10000 1 0.054 0.112 0.48
32 10000 4 0.206 0.093 2.21
32 10000 32 1.826 0.091 20.12
32 100000 1 0.292 0.636 0.46
32 100000 4 2.086 0.606 3.44
32 100000 32 28.496 0.633 45.03
128 10000 1 0.125 0.266 0.47
128 10000 4 0.759 0.258 2.94
128 10000 32 7.362 0.254 29.03
128 100000 1 1.550 2.18 10.71
128 100000 4 8.712 2.22 23.92
128 100000 32 122.585 2.213 55.39
PiperOrigin-RevId: 157414849
|
|
|
|
| |
PiperOrigin-RevId: 156939644
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gumbel noise. -log(-log(U)) gives infinity if U draws a hard 0. Adds a tiny
offset to U (2e-30) to avoid log(U) = -inf.
The CPU sampling algorithm depends on the order of the logits which is
undesirable and can also oversample the first logit if it is smaller than the
smallest random float larger than 0 (~1e-7). Switching to double precision
internally mitigates these problems, although it doesn't fix them. Slowdown
is ~35% in the worst case.
Also adds various tests that we would like the sampling to pass.
CPU Benchmark before:
32 10000 1 0.060 0.069 0.87
32 10000 4 0.229 0.074 3.10
32 10000 32 2.180 0.059 37.09
32 100000 1 0.430 0.480 0.90
32 100000 4 2.322 0.449 5.17
32 100000 32 31.508 0.471 66.96
128 10000 1 0.168 0.235 0.71
128 10000 4 0.965 0.246 3.93
128 10000 32 7.989 0.225 35.51
128 100000 1 1.681 1.539 1.09
128 100000 4 9.012 1.57 35.73
128 100000 32 126.222 1.626 77.60
CPU Benchmark after:
32 10000 1 0.054 0.112 0.48
32 10000 4 0.206 0.093 2.21
32 10000 32 1.826 0.091 20.12
32 100000 1 0.292 0.636 0.46
32 100000 4 2.086 0.606 3.44
32 100000 32 28.496 0.633 45.03
128 10000 1 0.125 0.266 0.47
128 10000 4 0.759 0.258 2.94
128 10000 32 7.362 0.254 29.03
128 100000 1 1.550 2.18 10.71
128 100000 4 8.712 2.22 23.92
128 100000 32 122.585 2.213 55.39
PiperOrigin-RevId: 156917266
|
|
|
|
|
|
|
|
|
|
| |
On the GPU, tf.multinomial uses Eigen. On empty input, this triggers a bug in
Eigen causing a crash. Fix this by not executing the kernel in the empty
output case.
Also fix shape validation assertions to handle more corner cases (hopefully all
of them).
Change: 127223716
|
|
Moved Multinomial-related code to its own files and BUILD target.
Motivation is to keep random_ops small.
Change: 125624016
|