aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core/products
Commit message (Collapse)AuthorAge
* Revert vec/y to vec*(1/y) in row-major TRSM:Gravatar Gael Guennebaud2016-12-06
| | | | | | - div is extremely costly - this is consistent with the column-major case - this is consistent with all other BLAS implementations
* Fix BLAS backend for symmetric rank K updates.Gravatar Gael Guennebaud2016-12-06
|
* typoGravatar Gael Guennebaud2016-12-05
|
* Improve performance of row-major-dense-matrix * vector products for recent CPUs.Gravatar Gael Guennebaud2016-12-05
| | | | | This revised version does not bother about aligned loads/stores, and rather processes 8 rows at ones for better instruction pipelining.
* Complete rewrite of column-major-matrix * vector product to deliver higher ↵Gravatar Gael Guennebaud2016-12-03
| | | | | | | | | | performance of modern CPU. The previous code has been optimized for Intel core2 for which unaligned loads/stores were prohibitively expensive. This new version exhibits much higher instruction independence (better pipelining) and explicitly leverage FMA. According to my benchmark, on Haswell this new kernel is always faster than the previous one, and sometimes even twice as fast. Even higher performance could be achieved with a better blocking size heuristic and, perhaps, with explicit prefetching. We should also check triangular product/solve to optimally exploit this new kernel (working on vertical panel of 4 columns is probably not optimal anymore).
* Fix misleading-indentation warnings.Gravatar Gael Guennebaud2016-12-01
|
* Merged eigen/eigen into defaultGravatar Benoit Steiner2016-11-03
|\
| * Fix previous merge.Gravatar Gael Guennebaud2016-10-14
| |
* | Renamed predux_half into predux_downto4Gravatar Benoit Steiner2016-10-06
| |
| * Add a simple cost model to prevent Eigen's parallel GEMM from using too many ↵Gravatar Rasmus Munk Larsen2016-10-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | threads when the inner dimension is small. Timing for square matrices is unchanged, but both CPU and Wall time are significantly improved for skinny matrices. The benchmarks below are for multiplying NxK * KxN matrices with test names of the form BM_OuterishProd/N/K. Improvements in Wall time: Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_OuterishProd/64/1 3088 1610 +47.9% BM_OuterishProd/64/4 3562 2414 +32.2% BM_OuterishProd/64/32 8861 7815 +11.8% BM_OuterishProd/128/1 11363 6504 +42.8% BM_OuterishProd/128/4 11128 9794 +12.0% BM_OuterishProd/128/64 27691 27396 +1.1% BM_OuterishProd/256/1 33214 28123 +15.3% BM_OuterishProd/256/4 34312 36818 -7.3% BM_OuterishProd/256/128 174866 176398 -0.9% BM_OuterishProd/512/1 7963684 104224 +98.7% BM_OuterishProd/512/4 7987913 112867 +98.6% BM_OuterishProd/512/256 8198378 1306500 +84.1% BM_OuterishProd/1k/1 7356256 324432 +95.6% BM_OuterishProd/1k/4 8129616 331621 +95.9% BM_OuterishProd/1k/512 27265418 7517538 +72.4% Improvements in CPU time: Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_OuterishProd/64/1 6169 1608 +73.9% BM_OuterishProd/64/4 7117 2412 +66.1% BM_OuterishProd/64/32 17702 15616 +11.8% BM_OuterishProd/128/1 45415 6498 +85.7% BM_OuterishProd/128/4 44459 9786 +78.0% BM_OuterishProd/128/64 110657 109489 +1.1% BM_OuterishProd/256/1 265158 28101 +89.4% BM_OuterishProd/256/4 274234 183885 +32.9% BM_OuterishProd/256/128 1397160 1408776 -0.8% BM_OuterishProd/512/1 78947048 520703 +99.3% BM_OuterishProd/512/4 86955578 1349742 +98.4% BM_OuterishProd/512/256 74701613 15584661 +79.1% BM_OuterishProd/1k/1 78352601 3877911 +95.1% BM_OuterishProd/1k/4 78521643 3966221 +94.9% BM_OuterishProd/1k/512 258104736 89480530 +65.3%
* | Merged latest updates from trunkGravatar Benoit Steiner2016-10-05
|\|
| * Fix alignement of statically allocated temporaries in symv, and trmv.Gravatar Gael Guennebaud2016-09-21
| |
| * Fix product for custom complex type. (conjugation was ignored)Gravatar Gael Guennebaud2016-09-14
| |
| * bug #1167: simplify installation of header files using cmake's ↵Gravatar Gael Guennebaud2016-08-29
| | | | | | | | install(DIRECTORY ...) command.
| * bug #1278: ease parsingGravatar Gael Guennebaud2016-08-22
| |
| * Fix performance regression in dgemm introduced by changeset ↵Gravatar Gael Guennebaud2016-07-02
| | | | | | | | 5d51a7f12c69138ed2a43df240bdf27a5313f7ce
| * Fix performance regression introduced in changeset ↵Gravatar Gael Guennebaud2016-07-02
| | | | | | | | | | | | | | | | | | e56aabf205a1e8f581dd8a46d7d46ce79c45e158 . Register blocking sizes are better handled by the cache size heuristics. The current code introduced very small blocks, for instance for 9x9 matrix, thus killing performance.
| * Relax mixing-type constraints for binary coefficient-wise operators:Gravatar Gael Guennebaud2016-06-06
| | | | | | | | | | | | | | | | | | | | - Replace internal::scalar_product_traits<A,B> by Eigen::ScalarBinaryOpTraits<A,B,OP> - Remove the "functor_is_product_like" helper (was pretty ugly) - Currently, OP is not used, but it is available to the user for fine grained tuning - Currently, only the following operators have been generalized: *,/,+,-,=,*=,/=,+=,-= - TODO: generalize all other binray operators (comparisons,pow,etc.) - TODO: handle "scalar op array" operators (currently only * is handled) - TODO: move the handling of the "void" scalar type to ScalarBinaryOpTraits
| * Handle some Index to int conversions in BLAS/LAPACK support.Gravatar Gael Guennebaud2016-05-26
| |
| * Introduce internal's UIntPtr and IntPtr types for pointer to integer ↵Gravatar Gael Guennebaud2016-05-26
| | | | | | | | | | | | | | | | conversions. This fixes "conversion from pointer to same-sized integral type" warnings by ICC. Ideally, we would use the std::[u]intptr_t types all the time, but since they are C99/C++11 only, let's be safe.
| * Remove the rotating kernel. It was only useful on some ARM CPUs (Qualcomm ↵Gravatar Benoit Jacob2016-05-24
| | | | | | | | Krait) that are not as ubiquitous today as they were when I introduced it.
| * Don't optimize the processing of the last rows of a matrix matrix product in ↵Gravatar Benoit Steiner2016-05-23
| | | | | | | | cases that violate the assumptions made by the optimized code path.
* | Pulled latest updates from upstreamGravatar Benoit Steiner2016-04-29
|\|
| * Made the index type a template parameter to evaluateProductBlockingSizesGravatar Benoit Steiner2016-04-27
| | | | | | | | Use numext::mini and numext::maxi instead of std::min/std::max to compute blocking sizes.
| * Deleted extraneous comma.Gravatar Benoit Steiner2016-04-15
| |
| * Improved the matrix multiplication blocking in the case where mr is not a ↵Gravatar Benoit Steiner2016-04-15
| | | | | | | | power of 2 (e.g on Haswell CPUs).
| * Fix trmv for mixing types.Gravatar Gael Guennebaud2016-04-15
| |
| * Added ability to access the cache sizes from the tensor devicesGravatar Benoit Steiner2016-04-14
| |
| * Workaround a division by zero when outerstride==0Gravatar Gael Guennebaud2016-04-13
| |
* | Pull latest updates from upstreamGravatar Benoit Steiner2016-04-11
|\|
| * Cleanup obsolete assign_scalar_eig2mkl helper.Gravatar Gael Guennebaud2016-04-11
| |
| * Remove all references to MKL in BLAS wrappers.Gravatar Gael Guennebaud2016-04-11
| |
| * Fix long to int conversion in BLAS API.Gravatar Gael Guennebaud2016-04-11
| |
| * Silent unused warning.Gravatar Gael Guennebaud2016-04-11
| |
| * Relax dependency on MKL for EIGEN_USE_BLASGravatar Gael Guennebaud2016-04-11
| |
| * Removed executable bit from header filesGravatar Benoit Steiner2016-03-23
| |
| * bug #1161: fix division by zero for huge scalar typesGravatar Gael Guennebaud2016-02-03
| |
* | Updated the matrix multiplication code to make it compile with AVX512 enabled.Gravatar Benoit Steiner2016-02-01
| |
| * Fix tri = complex * real product, and add respective unit test.Gravatar Gael Guennebaud2016-01-27
| |
| * Remove dead code.Gravatar Gael Guennebaud2016-01-26
| |
| * Re-enable blocking on rows in non-l3 blocking mode.Gravatar Gael Guennebaud2016-01-26
| |
| * Make sure that micro-panel-size is smaller than blocking sizes (otherwise we ↵Gravatar Gael Guennebaud2016-01-26
| | | | | | | | might get a buffer overflow)
| * Make sure that block sizes are smaller than input matrix sizes.Gravatar Gael Guennebaud2016-01-26
| |
| * bug #51: add block preallocation mechanism to selfadjoit*matrix product.Gravatar Gael Guennebaud2016-01-25
| |
| * bug #51: make general_matrix_matrix_triangular_product use L3-blocking ↵Gravatar Gael Guennebaud2016-01-25
| | | | | | | | helper so that general symmetric rank-updates and general-matrix-to-triangular products do not trigger dynamic memory allocation for fixed size matrices.
| * bug #1151: remove useless critical sectionGravatar Gael Guennebaud2016-01-21
| |
* | Disabled part of the matrix matrix peeling code that's incompatible with 512 ↵Gravatar Benoit Steiner2015-12-21
| | | | | | | | bit registers
| * Fix compilation of MKL support.Gravatar Gael Guennebaud2015-12-11
|/
* Fixes internal compiler error while compiling with VC2015 Update1 x64.Gravatar Nikolay Fedorov2015-12-03
|
* Fix degenerate cases in syrk and trsmGravatar Gael Guennebaud2015-11-30
|