aboutsummaryrefslogtreecommitdiffhomepage
Commit message (Collapse)AuthorAge
* Use fix<> API to specify compile-time reshaped sizes.Gravatar Gael Guennebaud2017-01-29
|
* Cleanup intitial reshape implementation:Gravatar Gael Guennebaud2017-01-29
| | | | | - reshape -> reshaped - make it compatible with evaluators.
* import yoco xiao's work on reshapeGravatar Gael Guennebaud2017-01-29
|\
* | MSVC 1900 release is not c++14 compatible enough for us. The 1910 update ↵Gravatar Gael Guennebaud2017-01-27
| | | | | | | | seems to be fine though.
* | mergeGravatar Gael Guennebaud2017-01-27
|\ \
* | | Fix warningGravatar Gael Guennebaud2017-01-27
| | |
* | | Fix unamed type as template parametre issue.Gravatar Gael Guennebaud2017-01-27
| | |
| * | Revert PR-292. After further investigation, the memcpy->memmove change was ↵Gravatar Rasmus Munk Larsen2017-01-26
|/ / | | | | | | | | | | only good for Haswell on older versions of glibc. Adding a switch for small sizes is perhaps useful for string copies, but also has an overhead for larger sizes, making it a poor trade-off for general memcpy. This PR also removes a couple of unnecessary semi-colons in Eigen/src/Core/AssignEvaluator.h that caused compiler warning everywhere.
* | Merged in ggael/eigen-flexidexing (pull request PR-294)Gravatar Gael Guennebaud2017-01-26
|\ \ | | | | | | | | | generalized operator() for indexed access and slicing
| * | Fix useless ';' warningGravatar Gael Guennebaud2017-01-25
| | |
| * | Fix unamed type as template argument (ok in c++11 only)Gravatar Gael Guennebaud2017-01-25
| | |
| * | Fix duplicates of array_size bewteen unsupported and CoreGravatar Gael Guennebaud2017-01-25
| | |
* | | Merged eigen/eigen into defaultGravatar Rasmus Munk Larsen2017-01-25
|\ \ \
* | | | Reverse arguments for pmin in AVX.Gravatar Rasmus Munk Larsen2017-01-25
| | | |
| * | | bug #1383: fix regression in LinSpaced for integers and high<lowGravatar Gael Guennebaud2017-01-25
| | | |
| * | | bug #1381: fix sparse.diagonal() used as a rvalue.Gravatar Gael Guennebaud2017-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem was that is "sparse" is not const, then sparse.diagonal() must have the LValueBit flag meaning that sparse.diagonal().coeff(i) must returns a const reference, const Scalar&. However, sparse::coeff() cannot returns a reference for a non-existing zero coefficient. The trick is to return a reference to a local member of evaluator<SparseMatrix>.
| * | | bug #1383: Fix regression from 3.2 with LinSpaced(n,0,n-1) with n==0.Gravatar Gael Guennebaud2017-01-25
| | | |
* | | | Remove extra space.Gravatar Rasmus Munk Larsen2017-01-24
| | | |
| * | | Merged in rmlarsen/eigen2 (pull request PR-292)Gravatar Benoit Steiner2017-01-25
|/| | | | | | | | | | | | | | | Adds a fast memcpy function to Eigen.
| * | | Update copy helper to use fast_memcpy.Gravatar Rasmus Munk Larsen2017-01-24
| | | |
| * | | Adds a fast memcpy function to Eigen. This takes advantage of the following:Gravatar Rasmus Munk Larsen2017-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster. 2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation. The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}. Measured improvements in wall clock time: Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_memcpy_1T/2 3.48 2.39 +31.3% BM_memcpy_1T/8 12.3 6.51 +47.0% BM_memcpy_1T/64 371 383 -3.2% BM_memcpy_1T/512 66922 66720 +0.3% BM_memcpy_1T/4k 9892867 6849682 +30.8% BM_memcpy_1T/5k 14951099 10332856 +30.9% BM_memcpy_2T/2 3.50 2.46 +29.7% BM_memcpy_2T/8 12.3 7.66 +37.7% BM_memcpy_2T/64 371 376 -1.3% BM_memcpy_2T/512 66652 66788 -0.2% BM_memcpy_2T/4k 6145012 6117776 +0.4% BM_memcpy_2T/5k 9181478 9010942 +1.9% BM_memcpy_4T/2 3.47 2.47 +31.0% BM_memcpy_4T/8 12.3 6.67 +45.8 BM_memcpy_4T/64 374 376 -0.5% BM_memcpy_4T/512 67833 68019 -0.3% BM_memcpy_4T/4k 5057425 5188253 -2.6% BM_memcpy_4T/5k 7555638 7779468 -3.0% BM_memcpy_6T/2 3.51 2.50 +28.8% BM_memcpy_6T/8 12.3 7.61 +38.1% BM_memcpy_6T/64 373 378 -1.3% BM_memcpy_6T/512 66871 66774 +0.1% BM_memcpy_6T/4k 5112975 5233502 -2.4% BM_memcpy_6T/5k 7614180 7772246 -2.1% BM_memcpy_8T/2 3.47 2.41 +30.5% BM_memcpy_8T/8 12.4 10.5 +15.3% BM_memcpy_8T/64 372 388 -4.3% BM_memcpy_8T/512 67373 66588 +1.2% BM_memcpy_8T/4k 5148462 5254897 -2.1% BM_memcpy_8T/5k 7660989 7799058 -1.8% BM_memcpy_12T/2 3.50 2.40 +31.4% BM_memcpy_12T/8 12.4 7.55 +39.1 BM_memcpy_12T/64 374 378 -1.1% BM_memcpy_12T/512 67132 66683 +0.7% BM_memcpy_12T/4k 5185125 5292920 -2.1% BM_memcpy_12T/5k 7717284 7942684 -2.9% BM_slicingSmallPieces_1T/2 47.3 47.5 +0.4% BM_slicingSmallPieces_1T/8 53.6 52.3 +2.4% BM_slicingSmallPieces_1T/64 491 476 +3.1% BM_slicingSmallPieces_1T/512 21734 18814 +13.4% BM_slicingSmallPieces_1T/4k 394660 396760 -0.5% BM_slicingSmallPieces_1T/5k 218722 209244 +4.3% BM_slicingSmallPieces_2T/2 80.7 79.9 +1.0% BM_slicingSmallPieces_2T/8 54.2 53.1 +2.0 BM_slicingSmallPieces_2T/64 497 477 +4.0% BM_slicingSmallPieces_2T/512 21732 18822 +13.4% BM_slicingSmallPieces_2T/4k 392885 390490 +0.6% BM_slicingSmallPieces_2T/5k 221988 208678 +6.0% BM_slicingSmallPieces_4T/2 80.8 80.1 +0.9% BM_slicingSmallPieces_4T/8 54.1 53.2 +1.7% BM_slicingSmallPieces_4T/64 493 476 +3.4% BM_slicingSmallPieces_4T/512 21702 18758 +13.6% BM_slicingSmallPieces_4T/4k 393962 404023 -2.6% BM_slicingSmallPieces_4T/5k 249667 211732 +15.2% BM_slicingSmallPieces_6T/2 80.5 80.1 +0.5% BM_slicingSmallPieces_6T/8 54.4 53.4 +1.8% BM_slicingSmallPieces_6T/64 488 478 +2.0% BM_slicingSmallPieces_6T/512 21719 18841 +13.3% BM_slicingSmallPieces_6T/4k 394950 397583 -0.7% BM_slicingSmallPieces_6T/5k 223080 210148 +5.8% BM_slicingSmallPieces_8T/2 81.2 80.4 +1.0% BM_slicingSmallPieces_8T/8 58.1 53.5 +7.9% BM_slicingSmallPieces_8T/64 489 480 +1.8% BM_slicingSmallPieces_8T/512 21586 18798 +12.9% BM_slicingSmallPieces_8T/4k 394592 400165 -1.4% BM_slicingSmallPieces_8T/5k 219688 208301 +5.2% BM_slicingSmallPieces_12T/2 80.2 79.8 +0.7% BM_slicingSmallPieces_12T/8 54.4 53.4 +1.8 BM_slicingSmallPieces_12T/64 488 476 +2.5% BM_slicingSmallPieces_12T/512 21931 18831 +14.1% BM_slicingSmallPieces_12T/4k 393962 396541 -0.7% BM_slicingSmallPieces_12T/5k 218803 207965 +5.0%
* | | | Fix NaN propagation for AVX512.Gravatar Rasmus Munk Larsen2017-01-24
| | | |
* | | | Make NaN propagatation consistent between the pmax/pmin and ↵Gravatar Rasmus Munk Larsen2017-01-24
|/ / / | | | | | | | | | | | | | | | std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op. See #1373 for details.
| * | Add support for std::integral_constantGravatar Gael Guennebaud2017-01-24
| | |
| * | Add test for multiple symbolsGravatar Gael Guennebaud2017-01-24
| | |
| * | Fix seq().reverse() in c++98Gravatar Gael Guennebaud2017-01-24
| | |
| * | Add unit test for FixedInt and SymbolicGravatar Gael Guennebaud2017-01-24
| | |
| * | Add support for "SymbolicExpr op fix<N>" in C++98/11 mode.Gravatar Gael Guennebaud2017-01-24
| | |
| * | Extended the set of arithmetic operators supported by FixedInt (-,+,*,/,%,&,|)Gravatar Gael Guennebaud2017-01-24
| | |
| * | Add internal docGravatar Gael Guennebaud2017-01-24
| | |
| * | Rename fix_t to FixedIntGravatar Gael Guennebaud2017-01-24
| | |
* | | bug #1375: fix cmake installation with cmake 2.8Gravatar Gael Guennebaud2017-01-24
| | |
* | | bug #1376: add missing assertion on size mismatch with compound assignment ↵Gravatar Gael Guennebaud2017-01-23
| | | | | | | | | | | | operators (e.g., mat += mat.col(j))
* | | bug #1382: move using std::size_t/ptrdiff_t to Eigen's namespace (still ↵Gravatar Gael Guennebaud2017-01-23
| | | | | | | | | | | | better than the global namespace!)
* | | Add std:: namespace prefix to all (hopefully) instances if size_t/ptrdfiff_tGravatar Gael Guennebaud2017-01-23
| | |
* | | Use Index instead of size_tGravatar Gael Guennebaud2017-01-23
| | |
* | | bug #1379: fix compilation in sparse*diagonal*dense with openmpGravatar Gael Guennebaud2017-01-21
| | |
* | | bug #1378: fix doc (DiagonalIndex vs Diagonal)Gravatar Gael Guennebaud2017-01-21
| | |
| * | Recover compile-time size from seq(A,B) when A and B are fixed values. ↵Gravatar Gael Guennebaud2017-01-19
| | | | | | | | | | | | (c++11 only)
| * | Exploit fixed values in seq and reverse with C++98 compatibilityGravatar Gael Guennebaud2017-01-19
| | |
| * | Add support for fixed-value in symbolic expression, c++11 only for now.Gravatar Gael Guennebaud2017-01-19
| | |
* | | Made sure that enabling avx2 instructions enables avx and sse instructions ↵Gravatar Benoit Steiner2017-01-19
| | | | | | | | | | | | as well.
* | | Merged in LaFeuille/eigen (pull request PR-289)Gravatar Benoit Steiner2017-01-18
|\ \ \ | | | | | | | | | | | | Fix a typo
| | * | Remove dead codeGravatar Gael Guennebaud2017-01-18
| | | |
| | * | Add a Symbolic::FixedExpr helper expression to make sure the compiler fully ↵Gravatar Gael Guennebaud2017-01-18
| | | | | | | | | | | | | | | | optimize the usage of last and end.
| | * | Add a .reverse() member to ArithmeticSequence.Gravatar Gael Guennebaud2017-01-18
| | | |
| | * | Add missing operator*Gravatar Gael Guennebaud2017-01-18
| | | |
| | * | Update all block expressions to accept compile-time sizes passed by fix<N> ↵Gravatar Gael Guennebaud2017-01-18
| | | | | | | | | | | | | | | | or fix<N>(n)
| | * | Merge the generic and dynamic overloads of block()Gravatar Gael Guennebaud2017-01-17
| | | |
* | | | Defer set-to-zero in triangular = product so that no aliasing issue occur in ↵Gravatar Gael Guennebaud2017-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | the common: A.triangularView() = B*A.sefladjointView()*B.adjoint() case that used to work in 3.2.