eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
...
*	Converting all sycl buffers to uninitialised device only buffers; adding ↵	Mehdi Goli	2016-11-08
\| \| \| \|	memcpyHostToDevice and memcpyDeviceToHost on syclDevice; modifying all examples to obey the new rules; moving sycl queue creating to the device based on Benoit suggestion; removing the sycl specefic condition for returning m_result in TensorReduction.h according to Benoit suggestion.
*	Removed the sycl include from Eigen/Core and moved it to ↵	Mehdi Goli	2016-11-04
\| \| \| \|	Unsupported/Eigen/CXX11/Tensor; added TensorReduction for sycl (full reduction and partial reduction); added TensorReduction test case for sycl (full reduction and partial reduction); fixed the tile size on TensorSyclRun.h based on the device max work group size;
*	Fixing the code indentation in the TensorReduction.h file.	Mehdi Goli	2016-10-14
\|
*	Reducing the code by generalising sycl backend functions/structs.	Mehdi Goli	2016-10-14
\|
*	Fixed a bug impacting some outer reductions on GPU	Benoit Steiner	2016-09-12
\|
*	Don't attempt to optimize partial reductions when the optimized ↵	Benoit Steiner	2016-08-08
\| \| \| \|	implementation doesn't buy anything.
*	Improved partial reductions in more cases	Benoit Steiner	2016-07-22
\|
*	Fix warnings	Gael Guennebaud	2016-07-08
\|
*	Fix warning	Gael Guennebaud	2016-07-07
\|
*	Use array_prod to compute the number of elements contained in the input ↵	Benoit Steiner	2016-06-04
\| \| \| \|	tensor expression
*	Improved the performance of full reductions.	Benoit Steiner	2016-06-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AFTER: BM_fullReduction/10 4541 4543 154017 21.0M items/s BM_fullReduction/64 5191 5193 100000 752.5M items/s BM_fullReduction/512 9588 9588 71361 25.5G items/s BM_fullReduction/4k 244314 244281 2863 64.0G items/s BM_fullReduction/5k 359382 359363 1946 64.8G items/s BEFORE: BM_fullReduction/10 9085 9087 74395 10.5M items/s BM_fullReduction/64 9478 9478 72014 412.1M items/s BM_fullReduction/512 14643 14646 46902 16.7G items/s BM_fullReduction/4k 260338 260384 2678 60.0G items/s BM_fullReduction/5k 385076 385178 1818 60.5G items/s
*	Resolved merge conflicts	Benoit Steiner	2016-05-26
\|
*	Merged latest reduction improvements	Benoit Steiner	2016-05-26
\|\
* \|	Improved the performance of inner reductions.	Benoit Steiner	2016-05-26
\| \|
* \|	Merged in rmlarsen/eigen (pull request PR-188)	Benoit Steiner	2016-05-23
\|\ \ \| \| \| \| \| \| \| \| \|	Minor cleanups: 1. Get rid of a few unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL.
* \| \|	Make EIGEN_HAS_CONSTEXPR user configurable	Gael Guennebaud	2016-05-20
\| \| \|
* \| \|	Make EIGEN_HAS_VARIADIC_TEMPLATES user configurable	Gael Guennebaud	2016-05-20
\| \| \|
\| * \|	Minor cleanups: 1. Get rid of unused variables. 2. Get rid of last uses of ↵	Rasmus Munk Larsen	2016-05-18
\|/ / \| \| \| \| \| \|	EIGEN_USE_COST_MODEL.
\| *	Allow vectorized padding on GPU. This helps speed things up a little.	Benoit Steiner	2016-05-17
\|/ \| \| \| \| \| \| \| \| \| \| \| \|	Before: BM_padding/10 5000000 460 217.03 MFlops/s BM_padding/80 5000000 460 13899.40 MFlops/s BM_padding/640 5000000 461 888421.17 MFlops/s BM_padding/4K 5000000 460 54316322.55 MFlops/s After: BM_padding/10 5000000 454 220.20 MFlops/s BM_padding/80 5000000 455 14039.86 MFlops/s BM_padding/640 5000000 452 904968.83 MFlops/s BM_padding/4K 5000000 411 60750049.21 MFlops/s
*	Improved the portability of the tensor code	Benoit Steiner	2016-05-11
\|
*	Properly gate the use of half2.	Benoit Steiner	2016-05-10
\|
*	Improved the performance of full reductions on GPU:	Benoit Steiner	2016-05-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before: BM_fullReduction/10 200000 11751 8.51 MFlops/s BM_fullReduction/80 5000 523385 12.23 MFlops/s BM_fullReduction/640 50 36179326 11.32 MFlops/s BM_fullReduction/4K 1 2173517195 11.50 MFlops/s After: BM_fullReduction/10 500000 5987 16.70 MFlops/s BM_fullReduction/80 200000 10636 601.73 MFlops/s BM_fullReduction/640 50000 58428 7010.31 MFlops/s BM_fullReduction/4K 1000 2006106 12461.95 MFlops/s
*	Eigen Tensor cost model part 2: Thread scheduling for standard evaluators ↵	Rasmus Munk Larsen	2016-04-14
\| \| \| \|	and reductions. The cost model is turned off by default.
*	Eigen cost model part 1. This implements a basic recursive framework to ↵	Rasmus Munk Larsen	2016-04-14
\| \| \| \|	estimate the cost of evaluating tensor expressions.
*	Fixed compilation warnings on arm	Benoit Steiner	2016-03-28
\|
*	Avoid unnecessary conversions	Benoit Steiner	2016-03-23
\|
*	Fixed compilation warning	Benoit Steiner	2016-03-23
\|
*	Use a single Barrier instead of a collection of Notifications to reduce the ↵	Benoit Steiner	2016-03-22
\| \| \| \|	thread synchronization overhead
*	Avoid implicit cast	Benoit Steiner	2016-03-09
\|
*	Avoid unnecessary conversion from 32bit int to 64bit unsigned int	Benoit Steiner	2016-03-09
\|
*	Replace std::vector with our own implementation, as using the stl when ↵	Benoit Steiner	2016-03-08
\| \| \| \|	compiling with nvcc and avx enabled leads to many issues.
*	Simplified the full reduction code	Benoit Steiner	2016-03-08
\|
*	Decoupled the packet type definition from the definition of the tensor ops. ↵	Benoit Steiner	2016-03-08
\| \| \| \|	All the vectorization is now defined in the tensor evaluators. This will make it possible to relialably support devices with different packet types in the same compilation unit.
*	Made the signature of the inner and outer reducers consistent	Benoit Steiner	2016-02-29
\|
*	Optimized the performance of narrow reductions on CUDA devices	Benoit Steiner	2016-02-29
\|
*	Fixed a typo in the reduction code that could prevent large full reductionsx ↵	Benoit Steiner	2016-02-24
\| \| \| \|	from running properly on old cuda devices.
*	Fixed a number of compilation warnings generated by the cuda tests	Benoit Steiner	2016-01-31
\|
*	Fixed a couple of compilation warnings.	Benoit Steiner	2016-01-28
\|
*	Fixed some compilation problems with nvcc + clang	Benoit Steiner	2016-01-27
\|
*	Record whether the underlying tensor storage can be accessed directly during ↵	Benoit Steiner	2016-01-19
\| \| \| \|	the evaluation of an expression.
*	Properly record the rank of reduced tensors in the tensor traits.	Benoit Steiner	2016-01-13
\|
*	Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152)	Benoit Steiner	2016-01-11
\|\ \| \| \| \| \| \|	Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations.
* \|	Fixed a bug in the dispatch of optimized reduction kernels.	Benoit Steiner	2016-01-11
\| \|
* \|	Re-enabled the optimized reduction CUDA code.	Benoit Steiner	2016-01-11
\| \|
\| *	Alternative way of forcing instantiation of device kernels without	Jeremy Barnes	2016-01-10
\|/ \| \| \| \| \|	causing warnings or requiring device to device kernel invocations. This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.
*	Simplified the dispatch code.	Benoit Steiner	2016-01-08
\|
*	Reworked the dispatch of optimized cuda reduction kernels to workaround a ↵	Benoit Steiner	2016-01-08
\| \| \| \|	nvcc bug that prevented the code from compiling in optimized mode in some cases
*	Improved the performance of reductions on CUDA devices	Benoit Steiner	2016-01-04
\|
*	Optimized outer reduction on GPUs.	Benoit Steiner	2015-12-22
\|
*	Silenced some compilation warnings triggered by nvcc	Benoit Steiner	2015-12-17
\|