aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core
Commit message (Collapse)AuthorAge
* Optimize sparse<bool> && sparse<bool> to use the same path as for coeff-wise ↵Gravatar Gael Guennebaud2016-11-14
| | | | products.
* bump to 3.3.0Gravatar Gael Guennebaud2016-11-10
|
* bump to 3.3-rc2Gravatar Gael Guennebaud2016-11-04
|
* Improved AVX512 supportGravatar Benoit Steiner2016-11-03
|
* Merged eigen/eigen into defaultGravatar Benoit Steiner2016-11-03
|\
| * bug #1337: improve doc of homogeneous() and hnormalized()Gravatar Gael Guennebaud2016-11-03
| |
| * bug #1330: Cholmod supports double precision only, so let's trigger a static ↵Gravatar Gael Guennebaud2016-11-03
| | | | | | | | assertion if the scalar type does not match this requirement.
| * bug #1004: improve accuracy of LinSpaced for abs(low) >> abs(high).Gravatar Gael Guennebaud2016-11-02
| |
| * Add pinsertfirst function and implement pinsertlast for complex on SSE/AVX.Gravatar Gael Guennebaud2016-11-02
| |
| * Gate the code that refers to cuda fp16 primitives more thoroughlyGravatar Benoit Steiner2016-11-01
| |
| * Fix regression in X = (X*X.transpose())/s with X rectangular by deferring ↵Gravatar Gael Guennebaud2016-10-26
| | | | | | | | resizing of the destination after the creation of the evaluator of the source expression.
| * add a generic EIGEN_HAS_CXX11Gravatar Gael Guennebaud2016-10-26
| |
| * Fix warning with ICCGravatar Gael Guennebaud2016-10-26
| |
| * Fix ICC warningsGravatar Gael Guennebaud2016-10-25
| |
| * Add missing inline keywordsGravatar Gael Guennebaud2016-10-25
| |
| * Fixed a typoGravatar Benoit Steiner2016-10-25
| |
| * bug #1004: one more rewrite of LinSpaced for floating point numbers to ↵Gravatar Gael Guennebaud2016-10-25
| | | | | | | | | | | | | | | | guarantee both interpolation and monotonicity. This version simply does low+i*step plus a branch to return high if i==size-1. Vectorization is accomplished with a branch and the help of pinsertlast. Some quick benchmark revealed that the overhead is really marginal, even when filling small vectors.
| * Add a pinsertlast function replacing the last entry of a packet by a scalar.Gravatar Gael Guennebaud2016-10-25
| | | | | | | | (useful to vectorize LinSpaced)
| * bug #1333: fix bad usage of const_cast_derived. Better use .data() for that ↵Gravatar Gael Guennebaud2016-10-24
| | | | | | | | purpose.
| * Cleanup ArrayWrapper and MatrixWrapper by removing redundant accessors.Gravatar Gael Guennebaud2016-10-24
| |
| * bug #1004: remove the inaccurate "sequential" path for LinSpaced, mark ↵Gravatar Gael Guennebaud2016-10-24
| | | | | | | | | | | | respective function as deprecated, and enforce strict interpolation of the higher range using a correction term. Now, even with floating point precision, both the 'low' and 'high' bounds are exactly reproduced at i=0 and i=size-1 respectively.
| * bug #698: rewrite LinSpaced for integer scalar types to avoid overflow and ↵Gravatar Gael Guennebaud2016-10-24
| | | | | | | | | | | | | | guarantee an even spacing when possible. Otherwise, the "high" bound is implicitly lowered to the largest value allowing for an even distribution. This changeset also disable vectorization for this integer path.
| * bug #1328: workaround a compilation issue with gcc 4.2Gravatar Gael Guennebaud2016-10-20
| |
| * Fix previous merge.Gravatar Gael Guennebaud2016-10-14
| |
| * Merged in rmlarsen/eigen2 (pull request PR-232)Gravatar Gael Guennebaud2016-10-14
| |\ | | | | | | | | | Improve performance of parallelized matrix multiply for rectangular matrices
| * \ Merged in lukier/eigen (pull request PR-234)Gravatar Benoit Steiner2016-10-13
| |\ \ | | | | | | | | | | | | Enabling CUDA in Geometry
| | * | Fixes for min and abs after Benoit's comments, switched to numext.Gravatar Robert Lukierski2016-10-13
| | | |
| * | | Patch to allow VS2015 & CUDA 8.0 to compile with Eigen included. I'm not sureGravatar Avi Ginsburg2016-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | whether to limit the check to this compiler combination (` || (EIGEN_COMP_MSVC == 1900 && __CUDACC_VER__) `) or to leave it as it is. I also don't know if this will have any affect on including Eigen in device code (I'm not in my current project).
* | | | Deleted redundant implementation of preduxGravatar Benoit Steiner2016-10-12
| | | |
* | | | Merged eigen/eigen into defaultGravatar Benoit Steiner2016-10-12
|\| | |
* | | | Take advantage of AVX512 instructions whenever possible to speedup the ↵Gravatar Benoit Steiner2016-10-12
| | | | | | | | | | | | | | | | processing of 16 bit floats.
| | * | Fixes min() warnings.Gravatar Robert Lukierski2016-10-12
| | | |
| * | | Merged in rmlarsen/eigen (pull request PR-230)Gravatar Gael Guennebaud2016-10-12
| |\ \ \ | | | | | | | | | | | | | | | Fix a bug in psqrt for SSE and AVX when EIGEN_FAST_MATH=1
| | | * | Adding EIGEN_DEVICE_FUNC in the Geometry module.Gravatar Robert Lukierski2016-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | Additional CUDA necessary fixes in the Core (mostly usage of EIGEN_USING_STD_MATH).
| | * | | Fix copy-paste error: Must use _mm256_cmp_ps for AVX.Gravatar Rasmus Munk Larsen2016-10-12
| | | | |
| * | | | bug #1325: fix compilation on NEON with clangGravatar Gael Guennebaud2016-10-12
| | |/ / | |/| |
| * | | Reenabled the use of variadic templates on tegra x1 provides that the latest ↵Gravatar Benoit Steiner2016-10-08
| | | | | | | | | | | | | | | | version (i.e. JetPack 2.3) is used.
* | | | Renamed predux_half into predux_downto4Gravatar Benoit Steiner2016-10-06
| | | |
* | | | Fixed incorrect commentGravatar Benoit Steiner2016-10-06
| | | |
* | | | Fixed compilation error with gcc >= 5.3Gravatar Benoit Steiner2016-10-06
| | | |
* | | | Silenced a compilation warningGravatar Benoit Steiner2016-10-06
| | | |
| * | | Added missing AVX intrinsics for fp16: in particular, implemented predux ↵Gravatar Benoit Steiner2016-10-06
| | | | | | | | | | | | | | | | which is required by the matrix-vector code.
| | | * Add a simple cost model to prevent Eigen's parallel GEMM from using too many ↵Gravatar Rasmus Munk Larsen2016-10-06
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | threads when the inner dimension is small. Timing for square matrices is unchanged, but both CPU and Wall time are significantly improved for skinny matrices. The benchmarks below are for multiplying NxK * KxN matrices with test names of the form BM_OuterishProd/N/K. Improvements in Wall time: Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_OuterishProd/64/1 3088 1610 +47.9% BM_OuterishProd/64/4 3562 2414 +32.2% BM_OuterishProd/64/32 8861 7815 +11.8% BM_OuterishProd/128/1 11363 6504 +42.8% BM_OuterishProd/128/4 11128 9794 +12.0% BM_OuterishProd/128/64 27691 27396 +1.1% BM_OuterishProd/256/1 33214 28123 +15.3% BM_OuterishProd/256/4 34312 36818 -7.3% BM_OuterishProd/256/128 174866 176398 -0.9% BM_OuterishProd/512/1 7963684 104224 +98.7% BM_OuterishProd/512/4 7987913 112867 +98.6% BM_OuterishProd/512/256 8198378 1306500 +84.1% BM_OuterishProd/1k/1 7356256 324432 +95.6% BM_OuterishProd/1k/4 8129616 331621 +95.9% BM_OuterishProd/1k/512 27265418 7517538 +72.4% Improvements in CPU time: Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_OuterishProd/64/1 6169 1608 +73.9% BM_OuterishProd/64/4 7117 2412 +66.1% BM_OuterishProd/64/32 17702 15616 +11.8% BM_OuterishProd/128/1 45415 6498 +85.7% BM_OuterishProd/128/4 44459 9786 +78.0% BM_OuterishProd/128/64 110657 109489 +1.1% BM_OuterishProd/256/1 265158 28101 +89.4% BM_OuterishProd/256/4 274234 183885 +32.9% BM_OuterishProd/256/128 1397160 1408776 -0.8% BM_OuterishProd/512/1 78947048 520703 +99.3% BM_OuterishProd/512/4 86955578 1349742 +98.4% BM_OuterishProd/512/256 74701613 15584661 +79.1% BM_OuterishProd/1k/1 78352601 3877911 +95.1% BM_OuterishProd/1k/4 78521643 3966221 +94.9% BM_OuterishProd/1k/512 258104736 89480530 +65.3%
* | | Deleted unecessary CMakeLists.txt fileGravatar Benoit Steiner2016-10-05
| | |
* | | Silenced a compilation warning.Gravatar Benoit Steiner2016-10-05
| | |
* | | Merged latest updates from trunkGravatar Benoit Steiner2016-10-05
|\| |
* | | Silenced a few compilation warningsGravatar Benoit Steiner2016-10-05
| | |
| * | Properly characterize the CUDA packet primitives for fp16 as device onlyGravatar Benoit Steiner2016-10-04
| | |
| | * Update comment for fast sqrt.Gravatar Rasmus Munk Larsen2016-10-04
| | |
| | * Update comment for fast sqrt.Gravatar Rasmus Munk Larsen2016-10-04
| | |