aboutsummaryrefslogtreecommitdiffhomepage
Commit message (Collapse)AuthorAge
* Improve blocking heuristic: if the lhs fit within L1, then block on the rhs ↵Gravatar Gael Guennebaud2015-03-06
| | | | in L1 (allows to keep packed rhs in L1)
* Update gemm performance monitoring tool:Gravatar Gael Guennebaud2015-03-06
| | | | | | - permit to recompute a subset of changesets - update changeset list - add a few more cases
* Improve product kernel: replace the previous dynamic loop swaping strategy ↵Gravatar Gael Guennebaud2015-03-06
| | | | | | by a more general one: It consists in increasing the actual number of rows of lhs's micro horizontal panel for small depth such that L1 cache is fully exploited.
* Make benchmark-blocking-sizes detect changes to clock speed and be resilient ↵Gravatar Benoit Jacob2015-03-05
| | | | to that.
* Rename LSCG to LeastSquaresConjugateGradientGravatar Gael Guennebaud2015-03-05
|
* Product optimization: implement a dynamic loop-swapping startegy to improve ↵Gravatar Gael Guennebaud2015-03-05
| | | | memory accesses to the destination matrix in the case of K-rank-update like products, i.e., for products of the kind: "large x small" * "small x large"
* bug #824: improve accuracy of Quaternion::angularDistance using atan2 ↵Gravatar Gael Guennebaud2015-03-04
| | | | instead of acos.
* output to cout, not cerr, the actual resultsGravatar Benoit Jacob2015-03-04
|
* Complete the tool to analyze the efficiency of default sizes.Gravatar Benoit Jacob2015-03-04
|
* Really use zero guess in ConjugateGradients::solve as documentedGravatar Jan Blechta2015-02-18
| | | | and expected for consistency with other methods.
* mergeGravatar Gael Guennebaud2015-03-04
|\
* | Check for no-reallocation in SparseMatrix::insert (bug #974)Gravatar Gael Guennebaud2015-03-04
| |
* | Improve efficiency of SparseMatrix::insert/coeffRef for sequential ↵Gravatar Gael Guennebaud2015-03-04
| | | | | | | | outer-index insertion strategies (bug #974)
* | Update manual wrt new LSCG solver.Gravatar Gael Guennebaud2015-03-04
| |
* | Add a CG-based solver for rectangular least-square problems (bug #975).Gravatar Gael Guennebaud2015-03-04
| |
| * Fix asm comments in 1px1 kernelGravatar Benoit Jacob2015-03-03
| |
| * Fixed compilation error when compiling with gcc4.7Gravatar Benoit Steiner2015-03-03
| |
| * Add missing copyright noticesGravatar Benoit Jacob2015-03-03
| |
| * Add a benchmark-default-sizes action to benchmark-blocking-sizes.cppGravatar Benoit Jacob2015-03-03
| |
| * New scoring functor to select the pivot.Gravatar Marc Glisse2015-03-03
| | | | | | | | This is can be useful for non-floating point scalars, where choosing the biggest element is generally not the best choice.
| * must also disable complex<double> when disabling double vectorizationGravatar Benoit Jacob2015-03-03
|/
* Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON ↵Gravatar Benoit Jacob2015-03-03
| | | | intrinsics.
* Improve analyze-blocking-sizes, and in particular give it a ↵Gravatar Benoit Jacob2015-03-02
| | | | | | | evaluate-defaults tool that shows the efficiency of Eigen's default blocking sizes choices, using a previously computed table from benchmark-blocking-sizes.
* HalfPacket also needed to be disabled for double, on ARMv8.Gravatar Benoit Jacob2015-03-02
|
* Add SSE vectorization of Quaternion::conjugate. Significant speed-up when ↵Gravatar Gael Guennebaud2015-03-02
| | | | combined with products like q1*q2.conjugate()
* Fix for TensorIO for Fixed sized Tensors.Gravatar Abhijit Kundu2015-02-28
| | | | | | | The following code snippet was failing to compile: TensorFixedSize<double, Sizes<4, 3> > t_4x3; cout << 4x3;
* Merged eigen/eigen into defaultGravatar Abhijit Kundu2015-02-28
|\
| * Replaced POSIX random() by internal::randomGravatar Christoph Hertzberg2015-02-28
| |
| * Use @CMAKE_MAKE_PROGRAM@ instead of make in buildtests.shGravatar Christoph Hertzberg2015-02-28
| |
| * Fixed MPRealSupportGravatar Christoph Hertzberg2015-02-28
| |
| * Cygwin does not like weak linking either.Gravatar Christoph Hertzberg2015-02-28
| |
| * bug #967: Automatically add cxx11 suffix when building in C++11 modeGravatar Christoph Hertzberg2015-02-28
| |
| * Increase unit-test L1 cache size to ensure we are doing at least 2 peeled ↵Gravatar Gael Guennebaud2015-02-27
| | | | | | | | loop within product kernel.
| * Re-enbale detection of min/max parentheses protection, and re-enable ↵Gravatar Gael Guennebaud2015-02-27
| | | | | | | | mpreal_support unit test.
| * Reimplement the selection between rotating and non-rotating kernelsGravatar Benoit Jacob2015-02-27
| | | | | | | | | | | | using templates instead of macros and if()'s. That was needed to fix the build of unit tests on ARM, which I had broken. My bad for not testing earlier.
| * Pulled latest updates from trunkGravatar Benoit Steiner2015-02-27
| |\
| * | Fixed off-by-one error that prevented the evaluation of small tensor ↵Gravatar Benoit Steiner2015-02-27
| | | | | | | | | | | | expressions from being vectorized
| | * remove trailing commaGravatar Benoit Jacob2015-02-27
| | |
| | * Disable Packet2f/2i halfpacket support in NEON.Gravatar Benoit Jacob2015-02-27
| | | | | | | | | | | | | | | | | | I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match.
| | * Fix NEON build flags: in the current NDK, at least with the clang-3.5 toolchain,Gravatar Benoit Jacob2015-02-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | -mfpu=neon is not enough to activate NEON, since it's incompatible with the default float ABI, and I have to pass -mfloat-abi=softfp (which is what everyone does in practice). In fact, it would be a good idea to pass -mfloat-abi=softfp all the time, regardless of NEON. Also removing the -mcpu=cortex-a8, as 1) it's not needed and 2) if we really wanted to pass a specific -mcpu flag, that would presumably to tune performance for benchmarks, and it would then not really make sense to tune for the very old cortex-a8 (it reflects ARM CPUs from 5 years ago).
| | * Replace a static assert by a runtime one, fixes the build of unit tests on ARMGravatar Benoit Jacob2015-02-27
| |/ | | | | | | | | Also safely assert in the non-implemented path that should never be taken in practice, and would return wrong results.
* / Added CMake support for Tensor module. CMake now installs CXX11 Tensor ↵Gravatar Abhijit Kundu2015-02-26
|/ | | | module like the rest of the unsupported modules
* Fixed another compilation problem with TensorIntDiv.hGravatar Benoit Steiner2015-02-26
|
* Can now use the tensor 'reverse' operation as a lvalueGravatar Benoit Steiner2015-02-26
|
* Added missing copy constructorGravatar Benoit Steiner2015-02-26
|
* Avoid packing rhs multiple-times when blocking on the lhs only.Gravatar Gael Guennebaud2015-02-26
|
* Make sure that the block size computation is tested by our unit test.Gravatar Gael Guennebaud2015-02-26
|
* Update changeset list to be checked by perf_monitoring/gemm.Gravatar Gael Guennebaud2015-02-26
|
* Make perf_monitoring/gemm script more flexible:Gravatar Gael Guennebaud2015-02-26
| | | | | | - skip existing dataset - add a "-up" option to recompute the dataset (see script header) - allow to specify a filename prefix
* Implement a more generic blocking-size selection algorithm. See explanations ↵Gravatar Gael Guennebaud2015-02-26
| | | | | | | inlines. It performs extremely well on Haswell. The main issue is to reliably and quickly find the actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3)