aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/core/kernels/multinomial_op.cc
Commit message (Collapse)AuthorAge
* Fixes bits/bytes unit error in comment.Gravatar A. Unique TensorFlower2018-09-19
| | | | PiperOrigin-RevId: 213684048
* Add `tf.contrib.stateless.stateless_multinomial()`.Gravatar Derek Murray2018-04-11
| | | | | | This is a starting point for Dataset-compatible weighted sampling across a list of datasets. PiperOrigin-RevId: 192540412
* Add support for int32 output types to the Multinomial op.Gravatar Peter Hawkins2017-11-30
| | | | PiperOrigin-RevId: 177444775
* Speed up multinomial_op on CPU by using a vectorized Eigen expression and ↵Gravatar A. Unique TensorFlower2017-06-05
| | | | | | | | | | | | | | | | | | | | avoiding unnecessary casts. Benchmark with AVX+FMA enabled: Run on <redacted> (12 X 3492 MHz CPUs); 2017-06-05T12:54:07.881672447-07:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_Multinomial_cpu_1_10000_4 250817 172953 +31.0% BM_Multinomial_cpu_1_10000_128 273834 187552 +31.5% BM_Multinomial_cpu_1_10000_10000 1174175 1130778 +3.7% BM_Multinomial_cpu_1_100000_4 2040741 1276761 +37.4% BM_Multinomial_cpu_32_10000_4 10221765 4498666 +56.0% BM_Multinomial_cpu_32_10000_128 10638159 4994754 +53.0% BM_Multinomial_cpu_32_100000_4 100790019 44193314 +56.2% BM_Multinomial_cpu_128_100000_1 431269640 182506078 +57.7% PiperOrigin-RevId: 158061480
* Fix oversampling in the GPU version of multinomial due to an error in generatingGravatar A. Unique TensorFlower2017-05-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gumbel noise. -log(-log(U)) gives infinity if U draws a hard 0. Adds a tiny offset to U (2e-30) to avoid log(U) = -inf. The CPU sampling algorithm depends on the order of the logits which is undesirable and can also oversample the first logit if it is smaller than the smallest random float larger than 0 (~1e-7). Switching to double precision internally mitigates these problems, although it doesn't fix them. Slowdown is ~35% in the worst case. Also adds various tests that we would like the sampling to pass. CPU Benchmark before: 32 10000 1 0.060 0.069 0.87 32 10000 4 0.229 0.074 3.10 32 10000 32 2.180 0.059 37.09 32 100000 1 0.430 0.480 0.90 32 100000 4 2.322 0.449 5.17 32 100000 32 31.508 0.471 66.96 128 10000 1 0.168 0.235 0.71 128 10000 4 0.965 0.246 3.93 128 10000 32 7.989 0.225 35.51 128 100000 1 1.681 1.539 1.09 128 100000 4 9.012 1.57 35.73 128 100000 32 126.222 1.626 77.60 CPU Benchmark after: 32 10000 1 0.054 0.112 0.48 32 10000 4 0.206 0.093 2.21 32 10000 32 1.826 0.091 20.12 32 100000 1 0.292 0.636 0.46 32 100000 4 2.086 0.606 3.44 32 100000 32 28.496 0.633 45.03 128 10000 1 0.125 0.266 0.47 128 10000 4 0.759 0.258 2.94 128 10000 32 7.362 0.254 29.03 128 100000 1 1.550 2.18 10.71 128 100000 4 8.712 2.22 23.92 128 100000 32 122.585 2.213 55.39 PiperOrigin-RevId: 157414849
* Automated g4 rollback of changelist 156917266Gravatar A. Unique TensorFlower2017-05-23
| | | | PiperOrigin-RevId: 156939644
* Fix oversampling in the GPU version of multinomial due to an error in generatingGravatar A. Unique TensorFlower2017-05-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gumbel noise. -log(-log(U)) gives infinity if U draws a hard 0. Adds a tiny offset to U (2e-30) to avoid log(U) = -inf. The CPU sampling algorithm depends on the order of the logits which is undesirable and can also oversample the first logit if it is smaller than the smallest random float larger than 0 (~1e-7). Switching to double precision internally mitigates these problems, although it doesn't fix them. Slowdown is ~35% in the worst case. Also adds various tests that we would like the sampling to pass. CPU Benchmark before: 32 10000 1 0.060 0.069 0.87 32 10000 4 0.229 0.074 3.10 32 10000 32 2.180 0.059 37.09 32 100000 1 0.430 0.480 0.90 32 100000 4 2.322 0.449 5.17 32 100000 32 31.508 0.471 66.96 128 10000 1 0.168 0.235 0.71 128 10000 4 0.965 0.246 3.93 128 10000 32 7.989 0.225 35.51 128 100000 1 1.681 1.539 1.09 128 100000 4 9.012 1.57 35.73 128 100000 32 126.222 1.626 77.60 CPU Benchmark after: 32 10000 1 0.054 0.112 0.48 32 10000 4 0.206 0.093 2.21 32 10000 32 1.826 0.091 20.12 32 100000 1 0.292 0.636 0.46 32 100000 4 2.086 0.606 3.44 32 100000 32 28.496 0.633 45.03 128 10000 1 0.125 0.266 0.47 128 10000 4 0.759 0.258 2.94 128 10000 32 7.362 0.254 29.03 128 100000 1 1.550 2.18 10.71 128 100000 4 8.712 2.22 23.92 128 100000 32 122.585 2.213 55.39 PiperOrigin-RevId: 156917266
* Fix tf.multinomial to not crash on empty inputGravatar Geoffrey Irving2016-07-12
| | | | | | | | | | On the GPU, tf.multinomial uses Eigen. On empty input, this triggers a bug in Eigen causing a crash. Fix this by not executing the kernel in the empty output case. Also fix shape validation assertions to handle more corner cases (hopefully all of them). Change: 127223716
* Migrate Multinomial-related code out of random_ops.Gravatar Zongheng Yang2016-06-22
Moved Multinomial-related code to its own files and BUILD target. Motivation is to keep random_ops small. Change: 125624016