aboutsummaryrefslogtreecommitdiffhomepage
path: root/unsupported/Eigen/CXX11/src
Commit message (Collapse)AuthorAge
* Merged in ibab/eigen (pull request PR-192)Gravatar Benoit Steiner2016-06-03
|\ | | | | | | Add generic scan method
* | Improved the performance of full reductions.Gravatar Benoit Steiner2016-06-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AFTER: BM_fullReduction/10 4541 4543 154017 21.0M items/s BM_fullReduction/64 5191 5193 100000 752.5M items/s BM_fullReduction/512 9588 9588 71361 25.5G items/s BM_fullReduction/4k 244314 244281 2863 64.0G items/s BM_fullReduction/5k 359382 359363 1946 64.8G items/s BEFORE: BM_fullReduction/10 9085 9087 74395 10.5M items/s BM_fullReduction/64 9478 9478 72014 412.1M items/s BM_fullReduction/512 14643 14646 46902 16.7G items/s BM_fullReduction/4k 260338 260384 2678 60.0G items/s BM_fullReduction/5k 385076 385178 1818 60.5G items/s
| * Add generic scan methodGravatar Igor Babuschkin2016-06-03
|/
* Align the first element of the Waiter struct instead of padding it. This ↵Gravatar Benoit Steiner2016-06-02
| | | | reduces its memory footprint a bit while achieving the goal of preventing false sharing
* Add syntactic sugar to Eigen tensors to allow more natural syntax.Gravatar Rasmus Munk Larsen2016-06-02
| | | | | | | | | Specifically, this enables expressions involving: scalar + tensor scalar * tensor scalar / tensor scalar - tensor
* Add tensor scan opGravatar Igor Babuschkin2016-06-02
| | | | | This is the initial implementation a generic scan operation. Based on this, cumsum and cumprod method have been added to TensorBase.
* Use a single PacketSize variableGravatar Benoit Steiner2016-06-01
|
* Fixed compilation warningGravatar Benoit Steiner2016-06-01
|
* Silenced compilation warning generated by nvcc.Gravatar Benoit Steiner2016-06-01
|
* Added support for mean reductions on fp16Gravatar Benoit Steiner2016-06-01
|
* Only enable optimized reductions of fp16 if the reduction functor supports themGravatar Benoit Steiner2016-05-31
|
* Reimplement clamp as a static function.Gravatar Benoit Steiner2016-05-27
|
* Use NULL instead of nullptr to preserve the compatibility with cxx03Gravatar Benoit Steiner2016-05-27
|
* Added a new operation to enable more powerful tensorindexing.Gravatar Benoit Steiner2016-05-27
|
* Fixed some compilation warningsGravatar Benoit Steiner2016-05-26
|
* Preserve the ability to vectorize the evaluation of an expression even when ↵Gravatar Benoit Steiner2016-05-26
| | | | it involves a cast that isn't vectorized (e.g fp16 to float)
* Resolved merge conflictsGravatar Benoit Steiner2016-05-26
|
* Merged latest reduction improvementsGravatar Benoit Steiner2016-05-26
|\
* | Improved the performance of inner reductions.Gravatar Benoit Steiner2016-05-26
| |
* | Code cleanup.Gravatar Benoit Steiner2016-05-26
| |
* | Made the static storage class qualifier come first.Gravatar Benoit Steiner2016-05-25
| |
* | Deleted unnecessary explicit qualifiers.Gravatar Benoit Steiner2016-05-25
| |
* | Don't mark inline functions as static since it confuses the ICC compilerGravatar Benoit Steiner2016-05-25
| |
* | Marked unused variables as suchGravatar Benoit Steiner2016-05-25
| |
* | Made the IndexPair code compile in non cxx11 modeGravatar Benoit Steiner2016-05-25
| |
* | Made the index pair list code more portable accross various compilersGravatar Benoit Steiner2016-05-25
| |
* | Improved the performance of tensor paddingGravatar Benoit Steiner2016-05-25
| |
* | Added support for statically known lists of pairs of indicesGravatar Benoit Steiner2016-05-25
| |
* | There is no need to make the fp16 full reduction kernel a static function.Gravatar Benoit Steiner2016-05-24
| |
* | Fixed compilation warningGravatar Benoit Steiner2016-05-24
| |
* | Merged in rmlarsen/eigen (pull request PR-188)Gravatar Benoit Steiner2016-05-23
|\ \ | | | | | | | | | Minor cleanups: 1. Get rid of a few unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL.
* | | Fix some sign-compare warningsGravatar Christoph Hertzberg2016-05-22
| | |
* | | Make EIGEN_HAS_CONSTEXPR user configurableGravatar Gael Guennebaud2016-05-20
| | |
* | | Make EIGEN_HAS_VARIADIC_TEMPLATES user configurableGravatar Gael Guennebaud2016-05-20
| | |
* | | Make EIGEN_HAS_RVALUE_REFERENCES user configurableGravatar Gael Guennebaud2016-05-20
| | |
* | | Rename EIGEN_HAVE_RVALUE_REFERENCES to EIGEN_HAS_RVALUE_REFERENCESGravatar Gael Guennebaud2016-05-20
| | |
| * | Merged eigen/eigen into defaultGravatar Rasmus Larsen2016-05-18
| |\ \ | |/ / |/| |
| * | Merge.Gravatar Rasmus Munk Larsen2016-05-18
| |\ \
| * | | Minor cleanups: 1. Get rid of unused variables. 2. Get rid of last uses of ↵Gravatar Rasmus Munk Larsen2016-05-18
| | | | | | | | | | | | | | | | EIGEN_USE_COST_MODEL.
| | * | Reduce overhead for small tensors and cheap ops by short-circuiting the ↵Gravatar Rasmus Munk Larsen2016-05-17
| |/ / | | | | | | | | | const computation and block size calculation in parallelFor.
| | * Allow vectorized padding on GPU. This helps speed things up a little.Gravatar Benoit Steiner2016-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before: BM_padding/10 5000000 460 217.03 MFlops/s BM_padding/80 5000000 460 13899.40 MFlops/s BM_padding/640 5000000 461 888421.17 MFlops/s BM_padding/4K 5000000 460 54316322.55 MFlops/s After: BM_padding/10 5000000 454 220.20 MFlops/s BM_padding/80 5000000 455 14039.86 MFlops/s BM_padding/640 5000000 452 904968.83 MFlops/s BM_padding/4K 5000000 411 60750049.21 MFlops/s
* | | Advertize the packet api of the tensor reducers iff the corresponding packet ↵Gravatar Benoit Steiner2016-05-18
|/ / | | | | | | primitives are available.
* | #if defined(EIGEN_USE_NONBLOCKING_THREAD_POOL) is now #if ↵Gravatar Benoit Steiner2016-05-17
| | | | | | | | !defined(EIGEN_USE_SIMPLE_THREAD_POOL): the non blocking thread pool is the default since it's more scalable, and one needs to request the old thread pool explicitly.
* | Fixed compilation errorGravatar Benoit Steiner2016-05-17
| |
* | Fixed compilation error in the tensor thread poolGravatar Benoit Steiner2016-05-17
| |
* | Merge upstream.Gravatar Rasmus Munk Larsen2016-05-17
|\ \
* | | Roll back changes to core. Move include of TensorFunctors.h up to satisfy ↵Gravatar Rasmus Munk Larsen2016-05-17
| | | | | | | | | | | | dependence in TensorCostModel.h.
| * | Merged eigen/eigen into defaultGravatar Rasmus Larsen2016-05-17
|/| |
| * | Enable the use of the packet api to evaluate tensor broadcasts. This speed ↵Gravatar Benoit Steiner2016-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | things up quite a bit: Before" M_broadcasting/10 500000 3690 27.10 MFlops/s BM_broadcasting/80 500000 4014 1594.24 MFlops/s BM_broadcasting/640 100000 14770 27731.35 MFlops/s BM_broadcasting/4K 5000 632711 39512.48 MFlops/s After: BM_broadcasting/10 500000 4287 23.33 MFlops/s BM_broadcasting/80 500000 4455 1436.41 MFlops/s BM_broadcasting/640 200000 10195 40173.01 MFlops/s BM_broadcasting/4K 5000 423746 58997.57 MFlops/s
| * | Allow vectorized padding on GPU. This helps speed things up a littleGravatar Benoit Steiner2016-05-17
| |/ | | | | | | | | | | | | | | | | | | | | | | | | Before: BM_padding/10 5000000 460 217.03 MFlops/s BM_padding/80 5000000 460 13899.40 MFlops/s BM_padding/640 5000000 461 888421.17 MFlops/s BM_padding/4K 5000000 460 54316322.55 MFlops/s After: BM_padding/10 5000000 454 220.20 MFlops/s BM_padding/80 5000000 455 14039.86 MFlops/s BM_padding/640 5000000 452 904968.83 MFlops/s BM_padding/4K 5000000 411 60750049.21 MFlops/s