aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen
Commit message (Collapse)AuthorAge
* bug #1680: improve MSVC inlining by declaring many triavial constructors and ↵Gravatar Gael Guennebaud2019-02-15
| | | | accessors as STRONG_INLINE.
* bug #1680: make all "block" methods strong-inline and device-functions (some ↵Gravatar Gael Guennebaud2019-02-15
| | | | were missing EIGEN_DEVICE_FUNC)
* bug #1678: Fix lack of __FMA__ macro on MSVC with AVX512Gravatar Gael Guennebaud2019-02-15
|
* bug #1678: workaround MSVC compilation issues with AVX512Gravatar Gael Guennebaud2019-02-15
|
* bug #1679: avoid possible division by 0 in complex-schurGravatar Gael Guennebaud2019-02-15
|
* Revert ↵Gravatar Rasmus Munk Larsen2019-02-14
| | | | | | https://bitbucket.org/eigen/eigen/commits/b55b5c7280a0481f01fe5ec764d55c443a8b6496 .
* Let's properly use Score instead of std::abs, and remove deprecated FIXME ( ↵Gravatar Gael Guennebaud2019-02-11
| | | | a /= b does a/b and not a * (1/b) as it was a long time ago...)
* Fix compilation of empty products of the form: Mx0 * 0xNGravatar Gael Guennebaud2019-02-11
|
* Speed up 2x2 LU by a factor 2, and other small fixed sizes by about 10%.Gravatar Gael Guennebaud2019-02-11
| | | | Not sure that's so critical, but this does not complexify the code base much.
* Speedup PartialPivLU for small matrices by passing compile-time sizes when ↵Gravatar Gael Guennebaud2019-02-11
| | | | | | | | | | | | | | | | | | | | | available. This change set also makes a better use of Map<>+OuterStride and Ref<> yielding surprising speed up for small dynamic sizes as well. The table below reports times in micro seconds for 10 random matrices: | ------ float --------- | ------- double ------- | size | before after ratio | before after ratio | fixed 1 | 0.34 0.11 2.93 | 0.35 0.11 3.06 | fixed 2 | 0.81 0.24 3.38 | 0.91 0.25 3.60 | fixed 3 | 1.49 0.49 3.04 | 1.68 0.55 3.01 | fixed 4 | 2.31 0.70 3.28 | 2.45 1.08 2.27 | fixed 5 | 3.49 1.11 3.13 | 3.84 2.24 1.71 | fixed 6 | 4.76 1.64 2.88 | 4.87 2.84 1.71 | dyn 1 | 0.50 0.40 1.23 | 0.51 0.40 1.26 | dyn 2 | 1.08 0.85 1.27 | 1.04 0.69 1.49 | dyn 3 | 1.76 1.26 1.40 | 1.84 1.14 1.60 | dyn 4 | 2.57 1.75 1.46 | 2.67 1.66 1.60 | dyn 5 | 3.80 2.64 1.43 | 4.00 2.48 1.61 | dyn 6 | 5.06 3.43 1.47 | 5.15 3.21 1.60 |
* Make GEMM fallback to GEMV for runtime vectors.Gravatar Gael Guennebaud2019-02-07
| | | | | This is a more general and simpler version of changeset 4c0fa6ce0f81ce67dd6723528ddf72f66ae92ba2
* Backed out changeset 4c0fa6ce0f81ce67dd6723528ddf72f66ae92ba2Gravatar Gael Guennebaud2019-02-07
|
* bug #1676: workaround GCC's bug in c++17 mode.Gravatar Gael Guennebaud2019-02-07
|
* Remove duplicated comment lineGravatar Eugene Zhulenev2019-02-04
|
* Fix GeneralBlockPanelKernel Android compilationGravatar Eugene Zhulenev2019-02-04
|
* bug #1674: disable GCC's unsafe-math-optimizations in sin/cos vectorization ↵Gravatar Gael Guennebaud2019-02-03
| | | | (results are completely wrong otherwise)
* Merged in rmlarsen/eigen (pull request PR-578)Gravatar Rasmus Larsen2019-02-02
|\ | | | | | | | | | | Speed up Eigen matrix*vector and vector*matrix multiplication. Approved-by: Eugene Zhulenev <ezhulenev@google.com>
* | Speed up row-major matrix-vector product on ARMGravatar Sameer Agarwal2019-02-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The row-major matrix-vector multiplication code uses a threshold to check if processing 8 rows at a time would thrash the cache. This change introduces two modifications to this logic. 1. A smaller threshold for ARM and ARM64 devices. The value of this threshold was determined empirically using a Pixel2 phone, by benchmarking a large number of matrix-vector products in the range [1..4096]x[1..4096] and measuring performance separately on small and little cores with frequency pinning. On big (out-of-order) cores, this change has little to no impact. But on the small (in-order) cores, the matrix-vector products are up to 700% faster. Especially on large matrices. The motivation for this change was some internal code at Google which was using hand-written NEON for implementing similar functionality, processing the matrix one row at a time, which exhibited substantially better performance than Eigen. With the current change, Eigen handily beats that code. 2. Make the logic for choosing number of simultaneous rows apply unifiormly to 8, 4 and 2 rows instead of just 8 rows. Since the default threshold for non-ARM devices is essentially unchanged (32000 -> 32 * 1024), this change has no impact on non-ARM performance. This was verified by running the same set of benchmarks on a Xeon desktop.
| * Speed up Eigen matrix*vector and vector*matrix multiplication.Gravatar Rasmus Munk Larsen2019-01-31
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change speeds up Eigen matrix * vector and vector * matrix multiplication for dynamic matrices when it is known at runtime that one of the factors is a vector. The benchmarks below test c.noalias()= n_by_n_matrix * n_by_1_matrix; c.noalias()= 1_by_n_matrix * n_by_n_matrix; respectively. Benchmark measurements: SSE: Run on *** (72 X 2992 MHz CPUs); 2019-01-28T17:51:44.452697457-08:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_MatVec/64 1096 312 +71.5% BM_MatVec/128 4581 1464 +68.0% BM_MatVec/256 18534 5710 +69.2% BM_MatVec/512 118083 24162 +79.5% BM_MatVec/1k 704106 173346 +75.4% BM_MatVec/2k 3080828 742728 +75.9% BM_MatVec/4k 25421512 4530117 +82.2% BM_VecMat/32 352 130 +63.1% BM_VecMat/64 1213 425 +65.0% BM_VecMat/128 4640 1564 +66.3% BM_VecMat/256 17902 5884 +67.1% BM_VecMat/512 70466 24000 +65.9% BM_VecMat/1k 340150 161263 +52.6% BM_VecMat/2k 1420590 645576 +54.6% BM_VecMat/4k 8083859 4364327 +46.0% AVX2: Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:45:11.508545307-08:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_MatVec/64 619 120 +80.6% BM_MatVec/128 9693 752 +92.2% BM_MatVec/256 38356 2773 +92.8% BM_MatVec/512 69006 12803 +81.4% BM_MatVec/1k 443810 160378 +63.9% BM_MatVec/2k 2633553 646594 +75.4% BM_MatVec/4k 16211095 4327148 +73.3% BM_VecMat/64 925 227 +75.5% BM_VecMat/128 3438 830 +75.9% BM_VecMat/256 13427 2936 +78.1% BM_VecMat/512 53944 12473 +76.9% BM_VecMat/1k 302264 157076 +48.0% BM_VecMat/2k 1396811 675778 +51.6% BM_VecMat/4k 8962246 4459010 +50.2% AVX512: Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:35:17.239329863-08:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_MatVec/64 401 111 +72.3% BM_MatVec/128 1846 513 +72.2% BM_MatVec/256 36739 1927 +94.8% BM_MatVec/512 54490 9227 +83.1% BM_MatVec/1k 487374 161457 +66.9% BM_MatVec/2k 2016270 643824 +68.1% BM_MatVec/4k 13204300 4077412 +69.1% BM_VecMat/32 324 106 +67.3% BM_VecMat/64 1034 246 +76.2% BM_VecMat/128 3576 802 +77.6% BM_VecMat/256 13411 2561 +80.9% BM_VecMat/512 58686 10037 +82.9% BM_VecMat/1k 320862 163750 +49.0% BM_VecMat/2k 1406719 651397 +53.7% BM_VecMat/4k 7785179 4124677 +47.0% Currently watchingStop watching
* GEBP: improves pipelining in the 1pX4 path with FMA.Gravatar Gael Guennebaud2019-01-30
| | | | | Prior to this change, a product with a LHS having 8 rows was faster with AVX-only than with AVX+FMA. With AVX+FMA I measured a speed up of about x1.25 in such cases.
* Fix compilation with ARM64.Gravatar Gael Guennebaud2019-01-30
|
* Fix conflicts and mergeGravatar Gael Guennebaud2019-01-30
|\
* | According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101, the ↵Gravatar Gael Guennebaud2019-01-30
| | | | | | | | previous GCC issue is fixed in GCC trunk (will be gcc 9).
* | ARM64 & GEBP: add specialization for double +30% speed upGravatar Gael Guennebaud2019-01-30
| |
* | ARM64 & GEBP: Make use of vfmaq_laneq_f32 and workaround GCC's issue in ↵Gravatar Gael Guennebaud2019-01-30
| | | | | | | | generating good ASM
* | bug #1669: fix PartialPivLU/inverse with zero-sized matrices.Gravatar Gael Guennebaud2019-01-29
| |
* | Fix compilation with c++03 (local class cannot be template arguments), and ↵Gravatar Gael Guennebaud2019-01-29
| | | | | | | | make SparseMatrix::assignDiagonal truly protected.
* | bug #1574: implement "sparse_matrix =,+=,-= diagonal_matrix" with smart ↵Gravatar Gael Guennebaud2019-01-28
| | | | | | | | insertion strategies of missing diagonal coeffs.
* | Move evaluator<SparseCompressedBase>::find(i,j) to a more general and ↵Gravatar Gael Guennebaud2019-01-28
| | | | | | | | reusable SparseCompressedBase::lower_bound(i,j) functiion
* | Renaming some more `I` identifiersGravatar Christoph Hertzberg2019-01-26
| |
* | Fix compilation error in NEON GEBP specializaition of madd.Gravatar Rasmus Munk Larsen2019-01-25
| |
* | cleanupGravatar Gael Guennebaud2019-01-24
| |
* | PR 574: use variadic template instead of initializer_list to implement ↵Gravatar David Tellenbach2019-01-23
| | | | | | | | fixed-size vector ctor from coefficients.
* | Cleanup SFINAE in Array/Matrix(initializer_list) ctors and minor doc editing.Gravatar Gael Guennebaud2019-01-22
| |
* | PR 572: Add initializer list constructors to Matrix and Array (include unit ↵Gravatar David Tellenbach2019-01-21
| | | | | | | | | | | | | | | | tests and doc) - {1,2,3,4,5,...} for fixed-size vectors only - {{1,2,3},{4,5,6}} for the general cases - {{1,2,3,4,5,....}} is allowed for both row and column-vector
* | Replace host_define.h with cuda_runtime_api.hGravatar nluehr2019-01-18
| |
* | Mask unused-parameter warnings, when building with NDEBUGGravatar Christoph Hertzberg2019-01-18
| |
* | Add missing logical packet ops for GPU and NEON.Gravatar Rasmus Munk Larsen2019-01-17
| |
* | Remove some useless const_castGravatar Gael Guennebaud2019-01-17
| |
* | Make nestByValue works again (broken since 3.3) and add unit tests.Gravatar Gael Guennebaud2019-01-17
| |
* | Extend reshaped unit tests and remove useless const_castGravatar Gael Guennebaud2019-01-17
| |
* | Cleanup useless const_cast and add missing broadcast assignment testsGravatar Gael Guennebaud2019-01-17
| |
* | Make FullPivLU use conjugateIf<>Gravatar Gael Guennebaud2019-01-17
| |
* | PR 567: makes all dense solvers inherit SoverBase (LU,Cholesky,QR,SVD).Gravatar Patrick Peltzer2019-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | This changeset also includes: * add HouseholderSequence::conjugateIf * define int as the StorageIndex type for all dense solvers * dedicated unit tests, including assertion checking * _check_solve_assertion(): this method can be implemented in derived solver classes to implement custom checks * CompleteOrthogonalDecompositions: add applyZOnTheLeftInPlace, fix scalar type in applyZAdjointOnTheLeftInPlace(), add missing assertions * Cholesky: add missing assertions * FullPivHouseholderQR: Corrected Scalar type in _solve_impl() * BDCSVD: Unambiguous return type for ternary operator * SVDBase: Corrected Scalar type in _solve_impl()
* | Add conjugateIf<bool> members to DesneBase, TriangularView, SelfadjointView, ↵Gravatar Gael Guennebaud2019-01-17
| | | | | | | | and make PartialPivLU use it.
* | bug #1646: fix false aliasing detection for A.row(0) = A.col(0);Gravatar Gael Guennebaud2019-01-17
| | | | | | | | This changeset completely disable the detection for vectors for which are current mechanism cannot detect any positive aliasing anyway.
* | Fix compilation error for logical packet ops with older compilers.Gravatar Rasmus Munk Larsen2019-01-16
| |
* | GEBP: fix swapped kernel mode with AVX512 and complex scalarsGravatar Gael Guennebaud2019-01-16
| |
* | GEBP: cleanup logic to choose between a 4 packets of 1 packetGravatar Gael Guennebaud2019-01-16
| |
* | bug #1661: fix regression in GEBP and AVX512Gravatar Gael Guennebaud2019-01-16
| |