eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Merged in ibab/eigen (pull request PR-192)	Benoit Steiner	2016-06-03
\|\ \| \| \| \| \| \|	Add generic scan method
* \|	Improved the performance of full reductions.	Benoit Steiner	2016-06-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AFTER: BM_fullReduction/10 4541 4543 154017 21.0M items/s BM_fullReduction/64 5191 5193 100000 752.5M items/s BM_fullReduction/512 9588 9588 71361 25.5G items/s BM_fullReduction/4k 244314 244281 2863 64.0G items/s BM_fullReduction/5k 359382 359363 1946 64.8G items/s BEFORE: BM_fullReduction/10 9085 9087 74395 10.5M items/s BM_fullReduction/64 9478 9478 72014 412.1M items/s BM_fullReduction/512 14643 14646 46902 16.7G items/s BM_fullReduction/4k 260338 260384 2678 60.0G items/s BM_fullReduction/5k 385076 385178 1818 60.5G items/s
\| *	Add generic scan method	Igor Babuschkin	2016-06-03
\|/
*	Align the first element of the Waiter struct instead of padding it. This ↵	Benoit Steiner	2016-06-02
\| \| \| \|	reduces its memory footprint a bit while achieving the goal of preventing false sharing
*	Add syntactic sugar to Eigen tensors to allow more natural syntax.	Rasmus Munk Larsen	2016-06-02
\| \| \| \| \| \| \| \| \|	Specifically, this enables expressions involving: scalar + tensor scalar * tensor scalar / tensor scalar - tensor
*	Add tensor scan op	Igor Babuschkin	2016-06-02
\| \| \| \| \|	This is the initial implementation a generic scan operation. Based on this, cumsum and cumprod method have been added to TensorBase.
*	Use a single PacketSize variable	Benoit Steiner	2016-06-01
\|
*	Fixed compilation warning	Benoit Steiner	2016-06-01
\|
*	Silenced compilation warning generated by nvcc.	Benoit Steiner	2016-06-01
\|
*	Added support for mean reductions on fp16	Benoit Steiner	2016-06-01
\|
*	Only enable optimized reductions of fp16 if the reduction functor supports them	Benoit Steiner	2016-05-31
\|
*	Reimplement clamp as a static function.	Benoit Steiner	2016-05-27
\|
*	Use NULL instead of nullptr to preserve the compatibility with cxx03	Benoit Steiner	2016-05-27
\|
*	Added a new operation to enable more powerful tensorindexing.	Benoit Steiner	2016-05-27
\|
*	Fixed some compilation warnings	Benoit Steiner	2016-05-26
\|
*	Preserve the ability to vectorize the evaluation of an expression even when ↵	Benoit Steiner	2016-05-26
\| \| \| \|	it involves a cast that isn't vectorized (e.g fp16 to float)
*	Resolved merge conflicts	Benoit Steiner	2016-05-26
\|
*	Merged latest reduction improvements	Benoit Steiner	2016-05-26
\|\
* \|	Improved the performance of inner reductions.	Benoit Steiner	2016-05-26
\| \|
* \|	Code cleanup.	Benoit Steiner	2016-05-26
\| \|
* \|	Made the static storage class qualifier come first.	Benoit Steiner	2016-05-25
\| \|
* \|	Deleted unnecessary explicit qualifiers.	Benoit Steiner	2016-05-25
\| \|
* \|	Don't mark inline functions as static since it confuses the ICC compiler	Benoit Steiner	2016-05-25
\| \|
* \|	Marked unused variables as such	Benoit Steiner	2016-05-25
\| \|
* \|	Made the IndexPair code compile in non cxx11 mode	Benoit Steiner	2016-05-25
\| \|
* \|	Made the index pair list code more portable accross various compilers	Benoit Steiner	2016-05-25
\| \|
* \|	Improved the performance of tensor padding	Benoit Steiner	2016-05-25
\| \|
* \|	Added support for statically known lists of pairs of indices	Benoit Steiner	2016-05-25
\| \|
* \|	There is no need to make the fp16 full reduction kernel a static function.	Benoit Steiner	2016-05-24
\| \|
* \|	Fixed compilation warning	Benoit Steiner	2016-05-24
\| \|
* \|	Merged in rmlarsen/eigen (pull request PR-188)	Benoit Steiner	2016-05-23
\|\ \ \| \| \| \| \| \| \| \| \|	Minor cleanups: 1. Get rid of a few unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL.
* \| \|	Fix some sign-compare warnings	Christoph Hertzberg	2016-05-22
\| \| \|
* \| \|	Make EIGEN_HAS_CONSTEXPR user configurable	Gael Guennebaud	2016-05-20
\| \| \|
* \| \|	Make EIGEN_HAS_VARIADIC_TEMPLATES user configurable	Gael Guennebaud	2016-05-20
\| \| \|
* \| \|	Make EIGEN_HAS_RVALUE_REFERENCES user configurable	Gael Guennebaud	2016-05-20
\| \| \|
* \| \|	Rename EIGEN_HAVE_RVALUE_REFERENCES to EIGEN_HAS_RVALUE_REFERENCES	Gael Guennebaud	2016-05-20
\| \| \|
\| * \|	Merged eigen/eigen into default	Rasmus Larsen	2016-05-18
\| \|\ \ \| \|/ / \|/\| \|
\| * \|	Merge.	Rasmus Munk Larsen	2016-05-18
\| \|\ \
\| * \| \|	Minor cleanups: 1. Get rid of unused variables. 2. Get rid of last uses of ↵	Rasmus Munk Larsen	2016-05-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EIGEN_USE_COST_MODEL.
\| \| * \|	Reduce overhead for small tensors and cheap ops by short-circuiting the ↵	Rasmus Munk Larsen	2016-05-17
\| \|/ / \| \| \| \| \| \| \| \| \|	const computation and block size calculation in parallelFor.
\| \| *	Allow vectorized padding on GPU. This helps speed things up a little.	Benoit Steiner	2016-05-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before: BM_padding/10 5000000 460 217.03 MFlops/s BM_padding/80 5000000 460 13899.40 MFlops/s BM_padding/640 5000000 461 888421.17 MFlops/s BM_padding/4K 5000000 460 54316322.55 MFlops/s After: BM_padding/10 5000000 454 220.20 MFlops/s BM_padding/80 5000000 455 14039.86 MFlops/s BM_padding/640 5000000 452 904968.83 MFlops/s BM_padding/4K 5000000 411 60750049.21 MFlops/s
* \| \|	Advertize the packet api of the tensor reducers iff the corresponding packet ↵	Benoit Steiner	2016-05-18
\|/ / \| \| \| \| \| \|	primitives are available.
* \|	#if defined(EIGEN_USE_NONBLOCKING_THREAD_POOL) is now #if ↵	Benoit Steiner	2016-05-17
\| \| \| \| \| \| \| \|	!defined(EIGEN_USE_SIMPLE_THREAD_POOL): the non blocking thread pool is the default since it's more scalable, and one needs to request the old thread pool explicitly.
* \|	Fixed compilation error	Benoit Steiner	2016-05-17
\| \|
* \|	Fixed compilation error in the tensor thread pool	Benoit Steiner	2016-05-17
\| \|
* \|	Merge upstream.	Rasmus Munk Larsen	2016-05-17
\|\ \
* \| \|	Roll back changes to core. Move include of TensorFunctors.h up to satisfy ↵	Rasmus Munk Larsen	2016-05-17
\| \| \| \| \| \| \| \| \| \| \| \|	dependence in TensorCostModel.h.
\| * \|	Merged eigen/eigen into default	Rasmus Larsen	2016-05-17
\|/\| \|
\| * \|	Enable the use of the packet api to evaluate tensor broadcasts. This speed ↵	Benoit Steiner	2016-05-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	things up quite a bit: Before" M_broadcasting/10 500000 3690 27.10 MFlops/s BM_broadcasting/80 500000 4014 1594.24 MFlops/s BM_broadcasting/640 100000 14770 27731.35 MFlops/s BM_broadcasting/4K 5000 632711 39512.48 MFlops/s After: BM_broadcasting/10 500000 4287 23.33 MFlops/s BM_broadcasting/80 500000 4455 1436.41 MFlops/s BM_broadcasting/640 200000 10195 40173.01 MFlops/s BM_broadcasting/4K 5000 423746 58997.57 MFlops/s
\| * \|	Allow vectorized padding on GPU. This helps speed things up a little	Benoit Steiner	2016-05-17
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before: BM_padding/10 5000000 460 217.03 MFlops/s BM_padding/80 5000000 460 13899.40 MFlops/s BM_padding/640 5000000 461 888421.17 MFlops/s BM_padding/4K 5000000 460 54316322.55 MFlops/s After: BM_padding/10 5000000 454 220.20 MFlops/s BM_padding/80 5000000 455 14039.86 MFlops/s BM_padding/640 5000000 452 904968.83 MFlops/s BM_padding/4K 5000000 411 60750049.21 MFlops/s