Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | HalfPacket also needed to be disabled for double, on ARMv8. | Benoit Jacob | 2015-03-02 |
| | |||
* | Increase unit-test L1 cache size to ensure we are doing at least 2 peeled ↵ | Gael Guennebaud | 2015-02-27 |
| | | | | loop within product kernel. | ||
* | Re-enbale detection of min/max parentheses protection, and re-enable ↵ | Gael Guennebaud | 2015-02-27 |
| | | | | mpreal_support unit test. | ||
* | Reimplement the selection between rotating and non-rotating kernels | Benoit Jacob | 2015-02-27 |
| | | | | | | using templates instead of macros and if()'s. That was needed to fix the build of unit tests on ARM, which I had broken. My bad for not testing earlier. | ||
* | remove trailing comma | Benoit Jacob | 2015-02-27 |
| | |||
* | Disable Packet2f/2i halfpacket support in NEON. | Benoit Jacob | 2015-02-27 |
| | | | | | | I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match. | ||
* | Replace a static assert by a runtime one, fixes the build of unit tests on ARM | Benoit Jacob | 2015-02-27 |
| | | | | | Also safely assert in the non-implemented path that should never be taken in practice, and would return wrong results. | ||
* | Avoid packing rhs multiple-times when blocking on the lhs only. | Gael Guennebaud | 2015-02-26 |
| | |||
* | Make sure that the block size computation is tested by our unit test. | Gael Guennebaud | 2015-02-26 |
| | |||
* | Implement a more generic blocking-size selection algorithm. See explanations ↵ | Gael Guennebaud | 2015-02-26 |
| | | | | | | | inlines. It performs extremely well on Haswell. The main issue is to reliably and quickly find the actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3) | ||
* | Fix typos in block-size testing code, and set peeling on k to 8. | Gael Guennebaud | 2015-02-26 |
| | |||
* | So I extensively measured the impact of the offset in this prefetch. I tried ↵ | Benoit Jacob | 2015-02-25 |
| | | | | | | | | | | | | | | offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes). On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400. I could not see any significant impact of this offset. On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0. So let's just go with 0! Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether! | ||
* | bug #970: Add EIGEN_DEVICE_FUNC to RValue functions, in case Cuda supports ↵ | Christoph Hertzberg | 2015-02-24 |
| | | | | RValue-references. | ||
* | Fix my recent prefetch changes: | Benoit Jacob | 2015-02-23 |
| | | | | | | | | | | | - the first prefetch is actually harmful on Haswell with FMA, but it is the most beneficial on ARM. - the second prefetch... I was very stupid and multiplied by sizeof(scalar) and offset of a scalar* pointer. The old offset was 64; pk = 8, so 64=pk*8. So this effectively restores the older offset. Actually, there were two prefetches here, one with offset 48 and one with offset 64. I could not confirm any benefit from this strange 48 offset on either the haswell or my ARM device. | ||
* | log1p is defined only for real Scalars in C++11 | Christoph Hertzberg | 2015-02-21 |
| | |||
* | Fix compilation of unit tests disabling assertion cheking | Gael Guennebaud | 2015-02-21 |
| | |||
* | Fix doc of Ref<> | Gael Guennebaud | 2015-02-20 |
| | |||
* | In C++11 destructors do not throw by default (fix CommaInitializer unit test) | Gael Guennebaud | 2015-02-20 |
| | |||
* | Pulled latest changes from trunk | Benoit Steiner | 2015-02-19 |
|\ | |||
* | | Marked the CUDA packet primitives as EIGEN_DEVICE_FUNC since they'll end up ↵ | Benoit Steiner | 2015-02-19 |
| | | | | | | | | being executed on the GPU device. | ||
| * | Fix regression with C++11 support of lambda: now internal::result_of falls ↵ | Gael Guennebaud | 2015-02-19 |
| | | | | | | | | back to std::result_of in C++11. | ||
| * | Fix some calls to result_of on binary functors as unary ones. | Gael Guennebaud | 2015-02-19 |
| | | |||
| * | Declare const some const variables | Gael Guennebaud | 2015-02-19 |
|/ | |||
* | Add support for C++11 result_of/lambdas | Gael Guennebaud | 2015-02-19 |
| | |||
* | rotating kernel: avoid compiling anything outside of ARM | Benoit Jacob | 2015-02-18 |
| | |||
* | remove a newly introduced redundant typedef - sorry. | Benoit Jacob | 2015-02-18 |
| | |||
* | bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path | Benoit Jacob | 2015-02-18 |
| | | | | | | | | This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower). | ||
* | Fixed template parameter. | Hauke Heibel | 2015-02-18 |
| | |||
* | merge | Gael Guennebaud | 2015-02-18 |
|\ | |||
* | | Clean a bit computeProductBlockingSizes (use Index type, remove CEIL macro) | Gael Guennebaud | 2015-02-18 |
| | | |||
| * | bug #958 - Allow testing specific blocking sizes | Benoit Jacob | 2015-02-18 |
|/ | | | | | | | | | | | | | | This is only a debugging/testing patch. It allows testing specific product blocking sizes, typically to study the impact on performance. Example usage: int testk, testm, testn; #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZES #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_K testk #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_M testm #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_N testn #include <Eigen/Core> | ||
* | Fix a regression when using OpenMP, and fix bug #714: the number of threads ↵ | Gael Guennebaud | 2015-02-18 |
| | | | | might be lower than the number of requested ones | ||
* | Fix bug #945: workaround MSVC warning | Gael Guennebaud | 2015-02-18 |
| | |||
* | Add missing install directives for arch/CUDA | Gael Guennebaud | 2015-02-18 |
| | |||
* | Remove some dead stores. | Gael Guennebaud | 2015-02-18 |
| | |||
* | Packet must be passed by const reference and not by value to avoid alignment ↵ | Gael Guennebaud | 2015-02-17 |
| | | | | issue. | ||
* | Disable __m128* wrappers when compiling with AVX and -fabi-version=4 | Gael Guennebaud | 2015-02-17 |
| | |||
* | Fix compilation with GCC/AVX (workaround __m128 and __m256 being the same ↵ | Gael Guennebaud | 2015-02-17 |
| | | | | type with default ABI) | ||
* | Add PermutationMatrix::determinant method. | Gael Guennebaud | 2015-02-16 |
| | |||
* | bug #956: Fixed bug in move constructors of DenseStorage which caused ↵ | Martin Drozdik | 2015-02-16 |
| | | | | "moved-from" objects to be in an invalid state. | ||
* | Merged in chtz/eigen-indexconversion (pull request PR-92) | Gael Guennebaud | 2015-02-16 |
|\ | | | | | | | | | | | | | | | | | | | | | | | bug #877, bug #572: Get rid of Index conversion warnings, summary of changes: - Introduce a global typedef Eigen::Index making Eigen::DenseIndex and AnyExpr<>::Index deprecated (default is std::ptrdiff_t). - Eigen::Index is used throughout the API to represent indices, offsets, and sizes. - Classes storing an array of indices uses the type StorageIndex to store them. This is a template parameter of the class. Default is int. - Methods that *explicitly* set or return an element of such an array take or return a StorageIndex type. In all other cases, the Index type is used. | ||
| * | The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index | Gael Guennebaud | 2015-02-16 |
| | | |||
| * | Remove deprecated usage of expr::Index. | Gael Guennebaud | 2015-02-16 |
| | | |||
| * | Fix many long to int conversion warnings: | Gael Guennebaud | 2015-02-16 |
| | | | | | | | | | | | | - fix usage of Index (API) versus StorageIndex (when multiple indexes are stored) - use StorageIndex(val) when the input has already been check - use internal::convert_index<StorageIndex>(val) when val is potentially unsafe (directly comes from user input) | ||
* | | Pulled latest updates from trunk | Benoit Steiner | 2015-02-13 |
|\ \ | |||
* | | | Optimized version of the sin(), exp(), log() and sqrt() function for AVX | Benoit Steiner | 2015-02-13 |
| | | | |||
| * | | bug #953 - Fix prefetches in 3px4 product kernel | Benoit Jacob | 2015-02-13 |
| | | | | | | | | | | | | This gives a 10% speedup on nexus 4 and on nexus 5. | ||
| | * | Index refactoring: StorageIndex must be used for storage only (and locally ↵ | Gael Guennebaud | 2015-02-13 |
| | | | | | | | | | | | | when it make sense). In all other cases use the global Index type. | ||
| | * | Merge Index-refactoring branch with default, fix PastixSupport, remove some ↵ | Gael Guennebaud | 2015-02-13 |
| | |\ | | |/ | |/| | | | | useless typedefs | ||
| * | | merge Tensor module within Eigen/unsupported and update gemv BLAS wrapper | Gael Guennebaud | 2015-02-12 |
| |\ \ |