eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Add deprecated header files for TensorFlow	Gael Guennebaud	2018-07-12
\|
*	renaming Cuda files to Gpu in the unsupported/Eigen/CXX11/src/Tensor and ↵	Deven Desai	2018-06-20
\| \| \| \|	unsupported/test directories
*	Added support for CUDA 9.0.	Benoit Steiner	2017-08-31
\|
*	Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH ↵	Gael Guennebaud	2017-07-17
\| \| \| \|	aliases
*	Add labels to #ifdef, in TensorReductionCuda.h	Hugh Perkins	2017-06-06
\|
*	Made the Tensor code compile with clang 3.9	Benoit Steiner	2017-03-02
\|
*	Fix remaining CUDA >= 300 checks	Igor Babuschkin	2016-08-18
\|
*	Add the necessary CUDA >= 300 checks back	Igor Babuschkin	2016-08-18
\|
*	Remove CUDA >= 300 checks and enable outer reductin for doubles	Igor Babuschkin	2016-08-06
\|
*	Make use of atomicExch for atomicExchCustom	Igor Babuschkin	2016-08-05
\|
*	Enable efficient Tensor reduction for doubles	Igor Babuschkin	2016-07-01
\|
*	Made it possible to compile reductions for an old cuda architecture and run ↵	Benoit Steiner	2016-06-29
\| \| \| \|	them on a recent gpu.
*	Made the code compile when using CUDA architecture < 300	Benoit Steiner	2016-06-29
\|
*	Simplified the code that dispatches vectorized reductions on GPU	Benoit Steiner	2016-06-09
\|
*	Improved support for vectorization of 16-bit floats	Benoit Steiner	2016-06-09
\|
*	Misc small improvements to the reduction code.	Benoit Steiner	2016-06-06
\|
*	Improved the performance of full reductions.	Benoit Steiner	2016-06-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AFTER: BM_fullReduction/10 4541 4543 154017 21.0M items/s BM_fullReduction/64 5191 5193 100000 752.5M items/s BM_fullReduction/512 9588 9588 71361 25.5G items/s BM_fullReduction/4k 244314 244281 2863 64.0G items/s BM_fullReduction/5k 359382 359363 1946 64.8G items/s BEFORE: BM_fullReduction/10 9085 9087 74395 10.5M items/s BM_fullReduction/64 9478 9478 72014 412.1M items/s BM_fullReduction/512 14643 14646 46902 16.7G items/s BM_fullReduction/4k 260338 260384 2678 60.0G items/s BM_fullReduction/5k 385076 385178 1818 60.5G items/s
*	Silenced compilation warning generated by nvcc.	Benoit Steiner	2016-06-01
\|
*	Added support for mean reductions on fp16	Benoit Steiner	2016-06-01
\|
*	Only enable optimized reductions of fp16 if the reduction functor supports them	Benoit Steiner	2016-05-31
\|
*	Resolved merge conflicts	Benoit Steiner	2016-05-26
\|
*	Merged latest reduction improvements	Benoit Steiner	2016-05-26
\|\
* \|	Improved the performance of inner reductions.	Benoit Steiner	2016-05-26
\| \|
* \|	There is no need to make the fp16 full reduction kernel a static function.	Benoit Steiner	2016-05-24
\| \|
\| *	Allow vectorized padding on GPU. This helps speed things up a little.	Benoit Steiner	2016-05-17
\|/ \| \| \| \| \| \| \| \| \| \| \| \|	Before: BM_padding/10 5000000 460 217.03 MFlops/s BM_padding/80 5000000 460 13899.40 MFlops/s BM_padding/640 5000000 461 888421.17 MFlops/s BM_padding/4K 5000000 460 54316322.55 MFlops/s After: BM_padding/10 5000000 454 220.20 MFlops/s BM_padding/80 5000000 455 14039.86 MFlops/s BM_padding/640 5000000 452 904968.83 MFlops/s BM_padding/4K 5000000 411 60750049.21 MFlops/s
*	Turnon the new thread pool by default since it scales much better over ↵	Benoit Steiner	2016-05-13
\| \| \| \|	multiple cores. It is still possible to revert to the old thread pool by compiling with the EIGEN_USE_SIMPLE_THREAD_POOL define.
*	Removed unnecessary thread synchronization	Benoit Steiner	2016-05-13
\|
*	Fixed a typo in my previous commit	Benoit Steiner	2016-05-11
\|
*	Fix potential race condition in the CUDA reduction code.	Benoit Steiner	2016-05-11
\|
*	Properly gate the use of half2.	Benoit Steiner	2016-05-10
\|
*	Small improvement to the full reduction of fp16	Benoit Steiner	2016-05-10
\|
*	Simplified the reduction code a little.	Benoit Steiner	2016-05-10
\|
*	Improved the performance of full reductions on GPU:	Benoit Steiner	2016-05-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before: BM_fullReduction/10 200000 11751 8.51 MFlops/s BM_fullReduction/80 5000 523385 12.23 MFlops/s BM_fullReduction/640 50 36179326 11.32 MFlops/s BM_fullReduction/4K 1 2173517195 11.50 MFlops/s After: BM_fullReduction/10 500000 5987 16.70 MFlops/s BM_fullReduction/80 200000 10636 601.73 MFlops/s BM_fullReduction/640 50000 58428 7010.31 MFlops/s BM_fullReduction/4K 1000 2006106 12461.95 MFlops/s
*	Don't crash when attempting to reduce empty tensors.	Benoit Steiner	2016-04-20
\|
*	Fixed a compilation error with nvcc 7.	Benoit Steiner	2016-04-19
\|
*	Simplified the code that launches cuda kernels.	Benoit Steiner	2016-04-19
\|
*	Use numext::ceil instead of std::ceil	Benoit Steiner	2016-04-19
\|
*	Fixed compilation warning	Benoit Steiner	2016-03-18
\|
*	Improved the performance of large outer reductions on cuda	Benoit Steiner	2016-02-29
\|
*	Made the signature of the inner and outer reducers consistent	Benoit Steiner	2016-02-29
\|
*	Optimized the performance of narrow reductions on CUDA devices	Benoit Steiner	2016-02-29
\|
*	Fixed a race condition that could affect some reductions on CUDA devices.	Benoit Steiner	2016-01-15
\|
*	Use warp shuffles instead of shared memory access to speedup the inner ↵	Benoit Steiner	2016-01-14
\| \| \| \|	reduction kernel.
*	Fixed a boundary condition bug in the outer reduction kernel	Benoit Steiner	2016-01-14
\|
*	Silenced a few compilation warnings.	Benoit Steiner	2016-01-11
\|
*	Deleted unused variable.	Benoit Steiner	2016-01-11
\|
*	Silenced a nvcc compilation warning	Benoit Steiner	2016-01-11
\|
*	Silenced several compilation warnings triggered by nvcc.	Benoit Steiner	2016-01-11
\|
*	Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152)	Benoit Steiner	2016-01-11
\|\ \| \| \| \| \| \|	Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations.
* \|	Re-enabled the optimized reduction CUDA code.	Benoit Steiner	2016-01-11
\| \|