Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Silenced a few more compilation warnings generated by nvcc | 2015-02-25 | |
| | |||
* | Added more tests to validate support for tensors laid out in RowMajor order. | 2015-02-25 | |
| | |||
* | Added support for RowMajor layout to the tensor patch extraction cofde. | 2015-02-25 | |
| | |||
* | Pulled latest changes from trunk | 2015-02-25 | |
|\ | |||
* | | Added support for RowMajor layout to the image patch extraction code | 2015-02-25 | |
| | | | | | | | | Speeded up the unsupported_cxx11_tensor_image_patch test and reduced its memory footprint | ||
| * | So I extensively measured the impact of the offset in this prefetch. I tried ↵ | 2015-02-25 | |
|/ | | | | | | | | | | | | | | offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes). On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400. I could not see any significant impact of this offset. On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0. So let's just go with 0! Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether! | ||
* | bug #970: Add EIGEN_DEVICE_FUNC to RValue functions, in case Cuda supports ↵ | 2015-02-24 | |
| | | | | RValue-references. | ||
* | Fix my recent prefetch changes: | 2015-02-23 | |
| | | | | | | | | | | | - the first prefetch is actually harmful on Haswell with FMA, but it is the most beneficial on ARM. - the second prefetch... I was very stupid and multiplied by sizeof(scalar) and offset of a scalar* pointer. The old offset was 64; pk = 8, so 64=pk*8. So this effectively restores the older offset. Actually, there were two prefetches here, one with offset 48 and one with offset 64. I could not confirm any benefit from this strange 48 offset on either the haswell or my ARM device. | ||
* | Add analyze-blocking-sizes program under bench/ to analyze multiple logs | 2015-02-23 | |
| | | | | generated by benchmark-blocking-sizes. | ||
* | Fix two trivial warnings | 2015-02-22 | |
| | |||
* | log1p is defined only for real Scalars in C++11 | 2015-02-21 | |
| | |||
* | I can reproduce any problems that justified this hack. However it makes ↵ | 2015-02-21 | |
| | | | | builds fail in C++11 mode. | ||
* | Fix compilation of unit tests disabling assertion cheking | 2015-02-21 | |
| | |||
* | Add benchmark-blocking-sizes.cpp to bench/ per mailing list discussion. | 2015-02-20 | |
| | |||
* | Initial version of a small script to help tracking performance regressions | 2015-02-20 | |
| | |||
* | update bench_gemm | 2015-02-20 | |
| | |||
* | Fix doc of Ref<> | 2015-02-20 | |
| | |||
* | With C++11 Matrix<float> + Matrix<complex<float>> does not even compile | 2015-02-20 | |
| | |||
* | Remove EIGEN_TEST_C++0x option and let EIGEN_TEST_CXX11 adds the -std=c++11 flag | 2015-02-20 | |
| | |||
* | In C++11 destructors do not throw by default (fix CommaInitializer unit test) | 2015-02-20 | |
| | |||
* | Pulled latest changes from trunk | 2015-02-19 | |
|\ | |||
* | | Marked the CUDA packet primitives as EIGEN_DEVICE_FUNC since they'll end up ↵ | 2015-02-19 | |
| | | | | | | | | being executed on the GPU device. | ||
| * | Fix regression with C++11 support of lambda: now internal::result_of falls ↵ | 2015-02-19 | |
| | | | | | | | | back to std::result_of in C++11. | ||
| * | Fix a C++11 compilation issue in unit test | 2015-02-19 | |
| | | |||
| * | Fix some calls to result_of on binary functors as unary ones. | 2015-02-19 | |
| | | |||
| * | Declare const some const variables | 2015-02-19 | |
|/ | |||
* | Pulle latest updates from trunk | 2015-02-19 | |
|\ | |||
* | | Improved the documentations | 2015-02-19 | |
| | | |||
| * | Add support for C++11 result_of/lambdas | 2015-02-19 | |
| | | |||
| * | rotating kernel: avoid compiling anything outside of ARM | 2015-02-18 | |
| | | |||
| * | remove a newly introduced redundant typedef - sorry. | 2015-02-18 | |
| | | |||
| * | bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path | 2015-02-18 | |
| | | | | | | | | | | | | | | | | This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower). | ||
| * | Fixed template parameter. | 2015-02-18 | |
| | | |||
| * | merge | 2015-02-18 | |
| |\ | |||
| * | | Clean a bit computeProductBlockingSizes (use Index type, remove CEIL macro) | 2015-02-18 | |
| | | | |||
| * | | Fix bug #961: eigen-doc.tgz included part of itself. | 2015-02-18 | |
| | | | |||
| | * | bug #958 - Allow testing specific blocking sizes | 2015-02-18 | |
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | This is only a debugging/testing patch. It allows testing specific product blocking sizes, typically to study the impact on performance. Example usage: int testk, testm, testn; #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZES #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_K testk #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_M testm #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_N testn #include <Eigen/Core> | ||
| * | Fix a regression when using OpenMP, and fix bug #714: the number of threads ↵ | 2015-02-18 | |
| | | | | | | | | might be lower than the number of requested ones | ||
| * | Fix bug #945: workaround MSVC warning | 2015-02-18 | |
| | | |||
| * | Add missing install directives for arch/CUDA | 2015-02-18 | |
| | | |||
| * | Workaround dead store warnings in unit tests. | 2015-02-18 | |
| | | |||
| * | Add an internal assertion in makeCompressed to catch a possible risk of ↵ | 2015-02-18 | |
| | | | | | | | | null-pointer access. | ||
| * | Remove some dead stores. | 2015-02-18 | |
| | | |||
| * | Fix possible usage of a null pointer in CholmodSupport | 2015-02-18 | |
| | | |||
| * | Big 957, workaround MSVC/ICC compilation issue | 2015-02-18 | |
| | | |||
| * | Removed redundant typedef which confused old gcc versions. | 2015-02-18 | |
| | | |||
| * | Packet must be passed by const reference and not by value to avoid alignment ↵ | 2015-02-17 | |
|/ | | | | issue. | ||
* | Pulled latest updates from trunk | 2015-02-17 | |
|\ | |||
* | | Silenced compilation warning | 2015-02-17 | |
| | | |||
* | | Added support for tensor concatenation as lvalue | 2015-02-17 | |
| | |