| Commit message (Collapse) | Author | Age |
|
|
|
|
| |
Also safely assert in the non-implemented path that should never be taken in practice,
and would return wrong results.
|
| |
|
| |
|
|
|
|
|
|
|
| |
inlines.
It performs extremely well on Haswell. The main issue is to reliably and quickly find the
actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes).
On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400.
I could not see any significant impact of this offset.
On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0.
So let's just go with 0!
Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether!
|
|
|
|
| |
RValue-references.
|
|
|
|
|
|
|
|
|
|
|
| |
- the first prefetch is actually harmful on Haswell with FMA,
but it is the most beneficial on ARM.
- the second prefetch... I was very stupid and multiplied by sizeof(scalar)
and offset of a scalar* pointer. The old offset was 64; pk = 8, so 64=pk*8.
So this effectively restores the older offset. Actually, there were
two prefetches here, one with offset 48 and one with offset 64. I could not
confirm any benefit from this strange 48 offset on either the haswell or
my ARM device.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|\ |
|
| |
| |
| |
| | |
being executed on the GPU device.
|
| |
| |
| |
| | |
back to std::result_of in C++11.
|
| | |
|
|/ |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
This is substantially faster on ARM, where it's important to minimize the number of loads.
This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome.
Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).
|
| |
|
|\ |
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is only a debugging/testing patch. It allows testing specific
product blocking sizes, typically to study the impact on performance.
Example usage:
int testk, testm, testn;
#define EIGEN_TEST_SPECIFIC_BLOCKING_SIZES
#define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_K testk
#define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_M testm
#define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_N testn
#include <Eigen/Core>
|
|
|
|
| |
might be lower than the number of requested ones
|
| |
|
| |
|
|
|
|
| |
null-pointer access.
|
| |
|
| |
|
| |
|
|
|
|
| |
issue.
|
| |
|
| |
|
|
|
|
| |
type with default ABI)
|
| |
|
| |
|
|
|
|
| |
method.
|
| |
|
|
|
|
| |
"moved-from" objects to be in an invalid state.
|
| |
|
|
|
|
| |
Add respective regression unit test.
|
| |
|
| |
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
bug #877, bug #572: Get rid of Index conversion warnings, summary of changes:
- Introduce a global typedef Eigen::Index making Eigen::DenseIndex and AnyExpr<>::Index deprecated (default is std::ptrdiff_t).
- Eigen::Index is used throughout the API to represent indices, offsets, and sizes.
- Classes storing an array of indices uses the type StorageIndex to store them. This is a template parameter of the class. Default is int.
- Methods that *explicitly* set or return an element of such an array take or return a StorageIndex type. In all other cases, the Index type is used.
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| | |
- fix usage of Index (API) versus StorageIndex (when multiple indexes are stored)
- use StorageIndex(val) when the input has already been check
- use internal::convert_index<StorageIndex>(val) when val is potentially unsafe (directly comes from user input)
|