eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
...
\| *	Cleanup ArrayWrapper and MatrixWrapper by removing redundant accessors.	Gael Guennebaud	2016-10-24
\| \|
\| *	bug #1004: remove the inaccurate "sequential" path for LinSpaced, mark ↵	Gael Guennebaud	2016-10-24
\| \| \| \| \| \| \| \| \| \| \| \|	respective function as deprecated, and enforce strict interpolation of the higher range using a correction term. Now, even with floating point precision, both the 'low' and 'high' bounds are exactly reproduced at i=0 and i=size-1 respectively.
\| *	Merged in benoitsteiner/opencl (pull request PR-238)	Benoit Steiner	2016-10-24
\| \|\ \| \| \| \| \| \| \| \| \|	Added support for OpenCL to the Tensor Module
\| * \|	bug #698: rewrite LinSpaced for integer scalar types to avoid overflow and ↵	Gael Guennebaud	2016-10-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	guarantee an even spacing when possible. Otherwise, the "high" bound is implicitly lowered to the largest value allowing for an even distribution. This changeset also disable vectorization for this integer path.
\| * \|	bug #1328: workaround a compilation issue with gcc 4.2	Gael Guennebaud	2016-10-20
\| \| \|
\| \| *	Merge latest updates from trunk	Benoit Steiner	2016-10-20
\| \| \|\ \| \| \|/ \| \|/\|
\| * \|	Fixed a few typos in the ternary tensor expressions types	Benoit Steiner	2016-10-19
\| \| \|
\| \| *	Fixing the typo regarding missing #if needed for proper handling of ↵	Mehdi Goli	2016-10-16
\| \| \| \| \| \| \| \| \| \| \| \|	exceptions in Eigen/Core.
\| \| *	Merged ComputeCpp to default.	Luke Iwanski	2016-10-14
\| \| \|\
\| \| \| *	Applyiing Benoit's comment to return the missing line back in Eigen/Core	Mehdi Goli	2016-10-14
\| \| \| \|
\| * \| \|	Fix previous merge.	Gael Guennebaud	2016-10-14
\| \| \| \|
\| * \| \|	Merged in rmlarsen/eigen2 (pull request PR-232)	Gael Guennebaud	2016-10-14
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improve performance of parallelized matrix multiply for rectangular matrices
\| \| \| * \|	Merged ComputeCpp into default.	Luke Iwanski	2016-10-14
\| \| \| \|\\|
\| \| \| \| *	Reducing the code by generalising sycl backend functions/structs.	Mehdi Goli	2016-10-14
\| \| \| \| \|
\| * \| \| \|	Merged in lukier/eigen (pull request PR-234)	Benoit Steiner	2016-10-13
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enabling CUDA in Geometry
\| \| * \| \| \|	Fixes for min and abs after Benoit's comments, switched to numext.	Robert Lukierski	2016-10-13
\| \| \| \| \| \|
\| * \| \| \| \|	Patch to allow VS2015 & CUDA 8.0 to compile with Eigen included. I'm not sure	Avi Ginsburg	2016-10-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	whether to limit the check to this compiler combination (` \|\| (EIGEN_COMP_MSVC == 1900 && __CUDACC_VER__) `) or to leave it as it is. I also don't know if this will have any affect on including Eigen in device code (I'm not in my current project).
\| \| \| \| * \|	Merged eigen/eigen into default	Benoit Steiner	2016-10-12
\| \| \| \| \|\ \ \| \| \|_\|_\|/ / \| \|/\| \| \| \|
* \| \| \| \| \|	Deleted redundant implementation of predux	Benoit Steiner	2016-10-12
\| \| \| \| \| \|
\| * \| \| \| \|	Remove double ;;	Gael Guennebaud	2016-10-12
\| \| \| \| \| \|
* \| \| \| \| \|	Merged eigen/eigen into default	Benoit Steiner	2016-10-12
\|\\| \| \| \| \|
* \| \| \| \| \|	Take advantage of AVX512 instructions whenever possible to speedup the ↵	Benoit Steiner	2016-10-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	processing of 16 bit floats.
\| * \| \| \| \|	Fix SPQR for rectangular matrices	Gael Guennebaud	2016-10-12
\| \| \| \| \| \|
\| \| * \| \| \|	Fixes min() warnings.	Robert Lukierski	2016-10-12
\| \| \| \| \| \|
\| * \| \| \| \|	Merged in rmlarsen/eigen (pull request PR-230)	Gael Guennebaud	2016-10-12
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a bug in psqrt for SSE and AVX when EIGEN_FAST_MATH=1
\| \| \| * \| \| \|	Adding EIGEN_DEVICE_FUNC in the Geometry module.	Robert Lukierski	2016-10-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Additional CUDA necessary fixes in the Core (mostly usage of EIGEN_USING_STD_MATH).
\| \| * \| \| \| \|	Fix copy-paste error: Must use _mm256_cmp_ps for AVX.	Rasmus Munk Larsen	2016-10-12
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	bug #1325: fix compilation on NEON with clang	Gael Guennebaud	2016-10-12
\| \| \|/ / / / \| \|/\| \| \| \|
\| * \| \| \| \|	Reenabled the use of variadic templates on tegra x1 provides that the latest ↵	Benoit Steiner	2016-10-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	version (i.e. JetPack 2.3) is used.
\| \| \| \| * \|	Merge the content of the ComputeCpp branch into the default branch	Benoit Steiner	2016-10-07
\| \| \| \| \|\\|
\| * \| \| \| \|	Remove static qualifier of free-functions (inline is enough and this helps ↵	Gael Guennebaud	2016-10-07
\| \| \|_\|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \|	ICC to find the right overload)
* \| \| \| \|	Renamed predux_half into predux_downto4	Benoit Steiner	2016-10-06
\| \| \| \| \|
* \| \| \| \|	Fixed incorrect comment	Benoit Steiner	2016-10-06
\| \| \| \| \|
* \| \| \| \|	Fixed compilation error with gcc >= 5.3	Benoit Steiner	2016-10-06
\| \| \| \| \|
* \| \| \| \|	Silenced a compilation warning	Benoit Steiner	2016-10-06
\| \| \| \| \|
\| * \| \| \|	Added missing AVX intrinsics for fp16: in particular, implemented predux ↵	Benoit Steiner	2016-10-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	which is required by the matrix-vector code.
\| \| \| * \|	Add a simple cost model to prevent Eigen's parallel GEMM from using too many ↵	Rasmus Munk Larsen	2016-10-06
\| \| \|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	threads when the inner dimension is small. Timing for square matrices is unchanged, but both CPU and Wall time are significantly improved for skinny matrices. The benchmarks below are for multiplying NxK * KxN matrices with test names of the form BM_OuterishProd/N/K. Improvements in Wall time: Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_OuterishProd/64/1 3088 1610 +47.9% BM_OuterishProd/64/4 3562 2414 +32.2% BM_OuterishProd/64/32 8861 7815 +11.8% BM_OuterishProd/128/1 11363 6504 +42.8% BM_OuterishProd/128/4 11128 9794 +12.0% BM_OuterishProd/128/64 27691 27396 +1.1% BM_OuterishProd/256/1 33214 28123 +15.3% BM_OuterishProd/256/4 34312 36818 -7.3% BM_OuterishProd/256/128 174866 176398 -0.9% BM_OuterishProd/512/1 7963684 104224 +98.7% BM_OuterishProd/512/4 7987913 112867 +98.6% BM_OuterishProd/512/256 8198378 1306500 +84.1% BM_OuterishProd/1k/1 7356256 324432 +95.6% BM_OuterishProd/1k/4 8129616 331621 +95.9% BM_OuterishProd/1k/512 27265418 7517538 +72.4% Improvements in CPU time: Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_OuterishProd/64/1 6169 1608 +73.9% BM_OuterishProd/64/4 7117 2412 +66.1% BM_OuterishProd/64/32 17702 15616 +11.8% BM_OuterishProd/128/1 45415 6498 +85.7% BM_OuterishProd/128/4 44459 9786 +78.0% BM_OuterishProd/128/64 110657 109489 +1.1% BM_OuterishProd/256/1 265158 28101 +89.4% BM_OuterishProd/256/4 274234 183885 +32.9% BM_OuterishProd/256/128 1397160 1408776 -0.8% BM_OuterishProd/512/1 78947048 520703 +99.3% BM_OuterishProd/512/4 86955578 1349742 +98.4% BM_OuterishProd/512/256 74701613 15584661 +79.1% BM_OuterishProd/1k/1 78352601 3877911 +95.1% BM_OuterishProd/1k/4 78521643 3966221 +94.9% BM_OuterishProd/1k/512 258104736 89480530 +65.3%
* \| \| \|	Enabling AVX512 should also enable AVX2.	Benoit Steiner	2016-10-06
\| \| \| \|
\| * \| \|	Fix compilation of qr.inverse() for column and full pivoting variants.	Gael Guennebaud	2016-10-06
\| \| \| \|
* \| \| \|	Deleted unecessary CMakeLists.txt file	Benoit Steiner	2016-10-05
\| \| \| \|
* \| \| \|	Silenced a compilation warning.	Benoit Steiner	2016-10-05
\| \| \| \|
* \| \| \|	Merged latest updates from trunk	Benoit Steiner	2016-10-05
\|\\| \| \|
* \| \| \|	Silenced a few compilation warnings	Benoit Steiner	2016-10-05
\| \| \| \|
\| \| \| *	Pull the latest updates from trunk	Benoit Steiner	2016-10-05
\| \| \| \|\ \| \| \|_\|/ \| \|/\| \|
\| * \| \|	Properly characterize the CUDA packet primitives for fp16 as device only	Benoit Steiner	2016-10-04
\| \| \| \|
\| \| * \|	Update comment for fast sqrt.	Rasmus Munk Larsen	2016-10-04
\| \| \| \|
\| \| * \|	Update comment for fast sqrt.	Rasmus Munk Larsen	2016-10-04
\| \| \| \|
\| \| * \|	Fix a bug in the implementation of Carmack's fast sqrt algorithm in Eigen ↵	Rasmus Munk Larsen	2016-10-04
\| \|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(enabled by EIGEN_FAST_MATH), which causes the vectorized parts of the computation to return -0.0 instead of NaN for negative arguments. Benchmark speed in Giga-sqrts/s Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz ----------------------------------------- SSE AVX Fast=1 2.529G 4.380G Fast=0 1.944G 1.898G Fast=1 fixed 2.214G 3.739G This table illustrates the worst case in terms speed impact: It was measured by repeatedly computing the sqrt of an n=4096 float vector that fits in L1 cache. For large vectors the operation becomes memory bound and the differences between the different versions almost negligible.
\| * \|	Use explicit type casting to generate packets of zeros.	Benoit Steiner	2016-10-04
\| \| \|
\| * \|	Added support for constand std::complex numbers on GPU	Benoit Steiner	2016-10-03
\| \| \|