aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src
Commit message (Collapse)AuthorAge
...
* | | Switch to truncated casting when converting floating point types to integer. ↵Gravatar Benoit Steiner2015-02-27
| | | | | | | | | | | | This ensures that vectorized casts are consistent with scalar casts
* | | Added support for vectorized type casting of tensorsGravatar Benoit Steiner2015-02-27
| | |
* | | Added support for fast reciprocal square root computation.Gravatar Benoit Steiner2015-02-26
| | |
| | * Really use zero guess in ConjugateGradients::solve as documentedGravatar Jan Blechta2015-02-18
| | | | | | | | | | | | and expected for consistency with other methods.
| | * mergeGravatar Gael Guennebaud2015-03-04
| | |\
| | * | Check for no-reallocation in SparseMatrix::insert (bug #974)Gravatar Gael Guennebaud2015-03-04
| | | |
| | * | Improve efficiency of SparseMatrix::insert/coeffRef for sequential ↵Gravatar Gael Guennebaud2015-03-04
| | | | | | | | | | | | | | | | outer-index insertion strategies (bug #974)
| | * | Add a CG-based solver for rectangular least-square problems (bug #975).Gravatar Gael Guennebaud2015-03-04
| | | |
| | | * Fix asm comments in 1px1 kernelGravatar Benoit Jacob2015-03-03
| | | |
| | | * Add a benchmark-default-sizes action to benchmark-blocking-sizes.cppGravatar Benoit Jacob2015-03-03
| | | |
| | | * New scoring functor to select the pivot.Gravatar Marc Glisse2015-03-03
| | | | | | | | | | | | | | | | This is can be useful for non-floating point scalars, where choosing the biggest element is generally not the best choice.
| | | * must also disable complex<double> when disabling double vectorizationGravatar Benoit Jacob2015-03-03
| | |/
| | * Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON ↵Gravatar Benoit Jacob2015-03-03
| | | | | | | | | | | | intrinsics.
| | * HalfPacket also needed to be disabled for double, on ARMv8.Gravatar Benoit Jacob2015-03-02
| | |
| | * Add SSE vectorization of Quaternion::conjugate. Significant speed-up when ↵Gravatar Gael Guennebaud2015-03-02
| | | | | | | | | | | | combined with products like q1*q2.conjugate()
| | * Increase unit-test L1 cache size to ensure we are doing at least 2 peeled ↵Gravatar Gael Guennebaud2015-02-27
| | | | | | | | | | | | loop within product kernel.
| | * Re-enbale detection of min/max parentheses protection, and re-enable ↵Gravatar Gael Guennebaud2015-02-27
| |/ | | | | | | mpreal_support unit test.
| * Reimplement the selection between rotating and non-rotating kernelsGravatar Benoit Jacob2015-02-27
| | | | | | | | | | | | using templates instead of macros and if()'s. That was needed to fix the build of unit tests on ARM, which I had broken. My bad for not testing earlier.
| * remove trailing commaGravatar Benoit Jacob2015-02-27
| |
| * Disable Packet2f/2i halfpacket support in NEON.Gravatar Benoit Jacob2015-02-27
| | | | | | | | | | | | I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match.
| * Replace a static assert by a runtime one, fixes the build of unit tests on ARMGravatar Benoit Jacob2015-02-27
| | | | | | | | | | Also safely assert in the non-implemented path that should never be taken in practice, and would return wrong results.
| * Avoid packing rhs multiple-times when blocking on the lhs only.Gravatar Gael Guennebaud2015-02-26
| |
| * Make sure that the block size computation is tested by our unit test.Gravatar Gael Guennebaud2015-02-26
| |
| * Implement a more generic blocking-size selection algorithm. See explanations ↵Gravatar Gael Guennebaud2015-02-26
| | | | | | | | | | | | | | inlines. It performs extremely well on Haswell. The main issue is to reliably and quickly find the actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3)
| * Fix typos in block-size testing code, and set peeling on k to 8.Gravatar Gael Guennebaud2015-02-26
|/
* So I extensively measured the impact of the offset in this prefetch. I tried ↵Gravatar Benoit Jacob2015-02-25
| | | | | | | | | | | | | | offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes). On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400. I could not see any significant impact of this offset. On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0. So let's just go with 0! Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether!
* bug #970: Add EIGEN_DEVICE_FUNC to RValue functions, in case Cuda supports ↵Gravatar Christoph Hertzberg2015-02-24
| | | | RValue-references.
* Fix my recent prefetch changes:Gravatar Benoit Jacob2015-02-23
| | | | | | | | | | | - the first prefetch is actually harmful on Haswell with FMA, but it is the most beneficial on ARM. - the second prefetch... I was very stupid and multiplied by sizeof(scalar) and offset of a scalar* pointer. The old offset was 64; pk = 8, so 64=pk*8. So this effectively restores the older offset. Actually, there were two prefetches here, one with offset 48 and one with offset 64. I could not confirm any benefit from this strange 48 offset on either the haswell or my ARM device.
* Fix two trivial warningsGravatar Christoph Hertzberg2015-02-22
|
* log1p is defined only for real Scalars in C++11Gravatar Christoph Hertzberg2015-02-21
|
* Fix compilation of unit tests disabling assertion chekingGravatar Gael Guennebaud2015-02-21
|
* Fix doc of Ref<>Gravatar Gael Guennebaud2015-02-20
|
* In C++11 destructors do not throw by default (fix CommaInitializer unit test)Gravatar Gael Guennebaud2015-02-20
|
* Pulled latest changes from trunkGravatar Benoit Steiner2015-02-19
|\
* | Marked the CUDA packet primitives as EIGEN_DEVICE_FUNC since they'll end up ↵Gravatar Benoit Steiner2015-02-19
| | | | | | | | being executed on the GPU device.
| * Fix regression with C++11 support of lambda: now internal::result_of falls ↵Gravatar Gael Guennebaud2015-02-19
| | | | | | | | back to std::result_of in C++11.
| * Fix some calls to result_of on binary functors as unary ones.Gravatar Gael Guennebaud2015-02-19
| |
| * Declare const some const variablesGravatar Gael Guennebaud2015-02-19
|/
* Add support for C++11 result_of/lambdasGravatar Gael Guennebaud2015-02-19
|
* rotating kernel: avoid compiling anything outside of ARMGravatar Benoit Jacob2015-02-18
|
* remove a newly introduced redundant typedef - sorry.Gravatar Benoit Jacob2015-02-18
|
* bug #955 - Implement a rotating kernel alternative in the 3px4 gebp pathGravatar Benoit Jacob2015-02-18
| | | | | | | | This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).
* Fixed template parameter.Gravatar Hauke Heibel2015-02-18
|
* mergeGravatar Gael Guennebaud2015-02-18
|\
* | Clean a bit computeProductBlockingSizes (use Index type, remove CEIL macro)Gravatar Gael Guennebaud2015-02-18
| |
| * bug #958 - Allow testing specific blocking sizesGravatar Benoit Jacob2015-02-18
|/ | | | | | | | | | | | | | This is only a debugging/testing patch. It allows testing specific product blocking sizes, typically to study the impact on performance. Example usage: int testk, testm, testn; #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZES #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_K testk #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_M testm #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_N testn #include <Eigen/Core>
* Fix a regression when using OpenMP, and fix bug #714: the number of threads ↵Gravatar Gael Guennebaud2015-02-18
| | | | might be lower than the number of requested ones
* Fix bug #945: workaround MSVC warningGravatar Gael Guennebaud2015-02-18
|
* Add missing install directives for arch/CUDAGravatar Gael Guennebaud2015-02-18
|
* Add an internal assertion in makeCompressed to catch a possible risk of ↵Gravatar Gael Guennebaud2015-02-18
| | | | null-pointer access.