eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Address comments by bsteiner.	Rasmus Munk Larsen	2016-05-12
\|
*	Improvements to parallelFor.	Rasmus Munk Larsen	2016-05-12
\| \| \| \|	Move some scalar functors from TensorFunctors. to Eigen core.
*	Worked around a compilation error triggered by nvcc when compiling a tensor ↵	Benoit Steiner	2016-05-12
\| \| \| \|	concatenation kernel.
*	Fixed potential race condition in the non blocking thread pool	Benoit Steiner	2016-05-12
\|
*	Replace implicit cast with an explicit one	Benoit Steiner	2016-05-12
\|
*	Worked around compilation errors with older versions of gcc	Benoit Steiner	2016-05-11
\|
*	Improved the portability of the tensor code	Benoit Steiner	2016-05-11
\|
*	Added the ability to load fp16 using the texture path.	Benoit Steiner	2016-05-11
\| \| \| \|	Improved the performance of some reductions on fp16
*	Removed deprecated flag (which apparently was ignored anyway)	Christoph Hertzberg	2016-05-11
\|
*	fixed some double-promotion and sign-compare warnings	Christoph Hertzberg	2016-05-11
\|
*	Fixed a typo in my previous commit	Benoit Steiner	2016-05-11
\|
*	Fix potential race condition in the CUDA reduction code.	Benoit Steiner	2016-05-11
\|
*	Explicitely initialize all the atomic variables.	Benoit Steiner	2016-05-11
\|
*	Properly gate the use of half2.	Benoit Steiner	2016-05-10
\|
*	Added support for fp16 to the sigmoid functor.	Benoit Steiner	2016-05-10
\|
*	Small improvement to the full reduction of fp16	Benoit Steiner	2016-05-10
\|
*	Simplified the reduction code a little.	Benoit Steiner	2016-05-10
\|
*	Improved the performance of full reductions on GPU:	Benoit Steiner	2016-05-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before: BM_fullReduction/10 200000 11751 8.51 MFlops/s BM_fullReduction/80 5000 523385 12.23 MFlops/s BM_fullReduction/640 50 36179326 11.32 MFlops/s BM_fullReduction/4K 1 2173517195 11.50 MFlops/s After: BM_fullReduction/10 500000 5987 16.70 MFlops/s BM_fullReduction/80 200000 10636 601.73 MFlops/s BM_fullReduction/640 50000 58428 7010.31 MFlops/s BM_fullReduction/4K 1000 2006106 12461.95 MFlops/s
*	Added the ability to use a scratch buffer in cuda kernels	Benoit Steiner	2016-05-09
\|
*	Added a new parallelFor api to the thread pool device.	Benoit Steiner	2016-05-09
\|
*	Optimized the non blocking thread pool:	Benoit Steiner	2016-05-09
\| \| \| \| \| \| \| \| \|	* Use a pseudo-random permutation of queue indices during random stealing. This ensures that all the queues are considered. * Directly pop from a non-empty queue when we are waiting for work, instead of first noticing that there is a non-empty queue and then doing another round of random stealing to re-discover the non-empty queue. * Steal only 1 task from a remote queue instead of half of tasks.
*	Marked a few tensor operations as read only	Benoit Steiner	2016-05-05
\|
*	Relaxed an assertion that was tighter that necessary.	Benoit Steiner	2016-05-05
\|
*	Fixed some incorrect assertions	Benoit Steiner	2016-05-05
\|
*	Strongly hint but don't force the compiler to unroll a some loops in the ↵	Benoit Steiner	2016-05-05
\| \| \| \|	tensor executor. This results in up to 27% faster code.
*	Added tests for full contractions using thread pools and gpu devices.	Benoit Steiner	2016-05-05
\| \| \| \|	Fixed a couple of issues in the corresponding code.
*	Updated the contraction code to ensure that full contraction return a tensor ↵	Benoit Steiner	2016-05-05
\| \| \| \|	of rank 0
*	Enable and fix -Wdouble-conversion warnings	Christoph Hertzberg	2016-05-05
\|
*	Removed extraneous 'explicit' keywords	Benoit Steiner	2016-05-04
\|
*	Use numext::isfinite instead of std::isfinite	Benoit Steiner	2016-05-03
\|
*	Deleted superfluous explicit keyword.	Benoit Steiner	2016-05-03
\|
*	Fixed compilation error	Benoit Steiner	2016-05-01
\|
*	Added missing accessors to fixed sized tensors	Benoit Steiner	2016-04-29
\|
*	Deleted trailing commas	Benoit Steiner	2016-04-29
\|
*	Deleted useless trailing commas	Benoit Steiner	2016-04-29
\|
*	Deleted unnecessary trailing commas.	Benoit Steiner	2016-04-29
\|
*	Return the proper size (ie 1) for tensors of rank 0	Benoit Steiner	2016-04-29
\|
*	Deleted unused default values for template parameters	Benoit Steiner	2016-04-29
\|
*	Restore Tensor support for non c++11 compilers	Benoit Steiner	2016-04-29
\|
*	Fixed include path	Benoit Steiner	2016-04-29
\|
*	Fix missing inclusion of Eigen/Core	Gael Guennebaud	2016-04-27
\|
*	Use computeProductBlockingSizes to compute blocking for both ShardByCol and ↵	Rasmus Munk Larsen	2016-04-27
\| \| \| \|	ShardByRow cases.
*	Refactor the unsupported CXX11/Core module to internal headers only.	Gael Guennebaud	2016-04-26
\|
*	Fixed the partial evaluation of non vectorizable tensor subexpressions	Benoit Steiner	2016-04-25
\|
*	Refined the cost of the striding operation.	Benoit Steiner	2016-04-25
\|
*	Provide access to the base threadpool classes	Benoit Steiner	2016-04-21
\|
*	Added the ability to switch to the new thread pool with a #define	Benoit Steiner	2016-04-21
\|
*	Fixed several compilation warnings	Benoit Steiner	2016-04-21
\|
*	Don't crash when attempting to reduce empty tensors.	Benoit Steiner	2016-04-20
\|
*	Started to implement a portable way to yield.	Benoit Steiner	2016-04-19
\|