Commit message (Collapse) | Author | Age | ||
---|---|---|---|---|
... | ||||
* | Fix compilation of iterative solvers with dense matrices | 2015-03-09 | ||
| | ||||
* | Add typedefs for return types of SparseMatrixBase::selfadjointView | 2015-03-09 | ||
| | ||||
* | Add unit tests for CG and sparse-LLT for long int as storage-index | 2015-03-09 | ||
| | ||||
* | bug #963: make IncompleteLUT compatible with non-default storage index types. | 2015-03-09 | ||
| | ||||
* | Avoid undeflow when blocking size are tuned manually. | 2015-03-06 | ||
| | ||||
* | bug #969: workaround abiguous calls to Ref using enable_if. | 2015-03-06 | ||
| | ||||
* | bug #978: early return for vanishing products | 2015-03-06 | ||
| | ||||
* | Improve blocking heuristic: if the lhs fit within L1, then block on the rhs ↵ | 2015-03-06 | ||
| | | | | in L1 (allows to keep packed rhs in L1) | |||
* | Improve product kernel: replace the previous dynamic loop swaping strategy ↵ | 2015-03-06 | ||
| | | | | | | by a more general one: It consists in increasing the actual number of rows of lhs's micro horizontal panel for small depth such that L1 cache is fully exploited. | |||
* | Rename LSCG to LeastSquaresConjugateGradient | 2015-03-05 | ||
| | ||||
* | Product optimization: implement a dynamic loop-swapping startegy to improve ↵ | 2015-03-05 | ||
| | | | | memory accesses to the destination matrix in the case of K-rank-update like products, i.e., for products of the kind: "large x small" * "small x large" | |||
* | bug #824: improve accuracy of Quaternion::angularDistance using atan2 ↵ | 2015-03-04 | ||
| | | | | instead of acos. | |||
* | Really use zero guess in ConjugateGradients::solve as documented | 2015-02-18 | ||
| | | | | and expected for consistency with other methods. | |||
* | merge | 2015-03-04 | ||
|\ | ||||
* | | Check for no-reallocation in SparseMatrix::insert (bug #974) | 2015-03-04 | ||
| | | ||||
* | | Improve efficiency of SparseMatrix::insert/coeffRef for sequential ↵ | 2015-03-04 | ||
| | | | | | | | | outer-index insertion strategies (bug #974) | |||
* | | Update manual wrt new LSCG solver. | 2015-03-04 | ||
| | | ||||
* | | Add a CG-based solver for rectangular least-square problems (bug #975). | 2015-03-04 | ||
| | | ||||
| * | Fix asm comments in 1px1 kernel | 2015-03-03 | ||
| | | ||||
| * | Add a benchmark-default-sizes action to benchmark-blocking-sizes.cpp | 2015-03-03 | ||
| | | ||||
| * | New scoring functor to select the pivot. | 2015-03-03 | ||
| | | | | | | | | This is can be useful for non-floating point scalars, where choosing the biggest element is generally not the best choice. | |||
| * | must also disable complex<double> when disabling double vectorization | 2015-03-03 | ||
|/ | ||||
* | Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON ↵ | 2015-03-03 | ||
| | | | | intrinsics. | |||
* | HalfPacket also needed to be disabled for double, on ARMv8. | 2015-03-02 | ||
| | ||||
* | Add SSE vectorization of Quaternion::conjugate. Significant speed-up when ↵ | 2015-03-02 | ||
| | | | | combined with products like q1*q2.conjugate() | |||
* | Increase unit-test L1 cache size to ensure we are doing at least 2 peeled ↵ | 2015-02-27 | ||
| | | | | loop within product kernel. | |||
* | Re-enbale detection of min/max parentheses protection, and re-enable ↵ | 2015-02-27 | ||
| | | | | mpreal_support unit test. | |||
* | Reimplement the selection between rotating and non-rotating kernels | 2015-02-27 | ||
| | | | | | | using templates instead of macros and if()'s. That was needed to fix the build of unit tests on ARM, which I had broken. My bad for not testing earlier. | |||
* | remove trailing comma | 2015-02-27 | ||
| | ||||
* | Disable Packet2f/2i halfpacket support in NEON. | 2015-02-27 | ||
| | | | | | | I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match. | |||
* | Replace a static assert by a runtime one, fixes the build of unit tests on ARM | 2015-02-27 | ||
| | | | | | Also safely assert in the non-implemented path that should never be taken in practice, and would return wrong results. | |||
* | Avoid packing rhs multiple-times when blocking on the lhs only. | 2015-02-26 | ||
| | ||||
* | Make sure that the block size computation is tested by our unit test. | 2015-02-26 | ||
| | ||||
* | Implement a more generic blocking-size selection algorithm. See explanations ↵ | 2015-02-26 | ||
| | | | | | | | inlines. It performs extremely well on Haswell. The main issue is to reliably and quickly find the actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3) | |||
* | Fix typos in block-size testing code, and set peeling on k to 8. | 2015-02-26 | ||
| | ||||
* | So I extensively measured the impact of the offset in this prefetch. I tried ↵ | 2015-02-25 | ||
| | | | | | | | | | | | | | | offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes). On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400. I could not see any significant impact of this offset. On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0. So let's just go with 0! Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether! | |||
* | bug #970: Add EIGEN_DEVICE_FUNC to RValue functions, in case Cuda supports ↵ | 2015-02-24 | ||
| | | | | RValue-references. | |||
* | Fix my recent prefetch changes: | 2015-02-23 | ||
| | | | | | | | | | | | - the first prefetch is actually harmful on Haswell with FMA, but it is the most beneficial on ARM. - the second prefetch... I was very stupid and multiplied by sizeof(scalar) and offset of a scalar* pointer. The old offset was 64; pk = 8, so 64=pk*8. So this effectively restores the older offset. Actually, there were two prefetches here, one with offset 48 and one with offset 64. I could not confirm any benefit from this strange 48 offset on either the haswell or my ARM device. | |||
* | Fix two trivial warnings | 2015-02-22 | ||
| | ||||
* | log1p is defined only for real Scalars in C++11 | 2015-02-21 | ||
| | ||||
* | Fix compilation of unit tests disabling assertion cheking | 2015-02-21 | ||
| | ||||
* | Fix doc of Ref<> | 2015-02-20 | ||
| | ||||
* | In C++11 destructors do not throw by default (fix CommaInitializer unit test) | 2015-02-20 | ||
| | ||||
* | Pulled latest changes from trunk | 2015-02-19 | ||
|\ | ||||
* | | Marked the CUDA packet primitives as EIGEN_DEVICE_FUNC since they'll end up ↵ | 2015-02-19 | ||
| | | | | | | | | being executed on the GPU device. | |||
| * | Fix regression with C++11 support of lambda: now internal::result_of falls ↵ | 2015-02-19 | ||
| | | | | | | | | back to std::result_of in C++11. | |||
| * | Fix some calls to result_of on binary functors as unary ones. | 2015-02-19 | ||
| | | ||||
| * | Declare const some const variables | 2015-02-19 | ||
|/ | ||||
* | Add support for C++11 result_of/lambdas | 2015-02-19 | ||
| | ||||
* | rotating kernel: avoid compiling anything outside of ARM | 2015-02-18 | ||
| |