Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path | Benoit Jacob | 2015-02-18 |
| | | | | | | | | This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower). | ||
* | Fixed template parameter. | Hauke Heibel | 2015-02-18 |
| | |||
* | merge | Gael Guennebaud | 2015-02-18 |
|\ | |||
* | | Clean a bit computeProductBlockingSizes (use Index type, remove CEIL macro) | Gael Guennebaud | 2015-02-18 |
| | | |||
| * | bug #958 - Allow testing specific blocking sizes | Benoit Jacob | 2015-02-18 |
|/ | | | | | | | | | | | | | | This is only a debugging/testing patch. It allows testing specific product blocking sizes, typically to study the impact on performance. Example usage: int testk, testm, testn; #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZES #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_K testk #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_M testm #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_N testn #include <Eigen/Core> | ||
* | Fix a regression when using OpenMP, and fix bug #714: the number of threads ↵ | Gael Guennebaud | 2015-02-18 |
| | | | | might be lower than the number of requested ones | ||
* | Fix bug #945: workaround MSVC warning | Gael Guennebaud | 2015-02-18 |
| | |||
* | Add missing install directives for arch/CUDA | Gael Guennebaud | 2015-02-18 |
| | |||
* | Remove some dead stores. | Gael Guennebaud | 2015-02-18 |
| | |||
* | Packet must be passed by const reference and not by value to avoid alignment ↵ | Gael Guennebaud | 2015-02-17 |
| | | | | issue. | ||
* | Disable __m128* wrappers when compiling with AVX and -fabi-version=4 | Gael Guennebaud | 2015-02-17 |
| | |||
* | Fix compilation with GCC/AVX (workaround __m128 and __m256 being the same ↵ | Gael Guennebaud | 2015-02-17 |
| | | | | type with default ABI) | ||
* | Add PermutationMatrix::determinant method. | Gael Guennebaud | 2015-02-16 |
| | |||
* | bug #956: Fixed bug in move constructors of DenseStorage which caused ↵ | Martin Drozdik | 2015-02-16 |
| | | | | "moved-from" objects to be in an invalid state. | ||
* | Merged in chtz/eigen-indexconversion (pull request PR-92) | Gael Guennebaud | 2015-02-16 |
|\ | | | | | | | | | | | | | | | | | | | | | | | bug #877, bug #572: Get rid of Index conversion warnings, summary of changes: - Introduce a global typedef Eigen::Index making Eigen::DenseIndex and AnyExpr<>::Index deprecated (default is std::ptrdiff_t). - Eigen::Index is used throughout the API to represent indices, offsets, and sizes. - Classes storing an array of indices uses the type StorageIndex to store them. This is a template parameter of the class. Default is int. - Methods that *explicitly* set or return an element of such an array take or return a StorageIndex type. In all other cases, the Index type is used. | ||
| * | The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index | Gael Guennebaud | 2015-02-16 |
| | | |||
| * | Remove deprecated usage of expr::Index. | Gael Guennebaud | 2015-02-16 |
| | | |||
| * | Fix many long to int conversion warnings: | Gael Guennebaud | 2015-02-16 |
| | | | | | | | | | | | | - fix usage of Index (API) versus StorageIndex (when multiple indexes are stored) - use StorageIndex(val) when the input has already been check - use internal::convert_index<StorageIndex>(val) when val is potentially unsafe (directly comes from user input) | ||
* | | Pulled latest updates from trunk | Benoit Steiner | 2015-02-13 |
|\ \ | |||
* | | | Optimized version of the sin(), exp(), log() and sqrt() function for AVX | Benoit Steiner | 2015-02-13 |
| | | | |||
| * | | bug #953 - Fix prefetches in 3px4 product kernel | Benoit Jacob | 2015-02-13 |
| | | | | | | | | | | | | This gives a 10% speedup on nexus 4 and on nexus 5. | ||
| | * | Index refactoring: StorageIndex must be used for storage only (and locally ↵ | Gael Guennebaud | 2015-02-13 |
| | | | | | | | | | | | | when it make sense). In all other cases use the global Index type. | ||
| | * | Merge Index-refactoring branch with default, fix PastixSupport, remove some ↵ | Gael Guennebaud | 2015-02-13 |
| | |\ | | |/ | |/| | | | | useless typedefs | ||
| * | | merge Tensor module within Eigen/unsupported and update gemv BLAS wrapper | Gael Guennebaud | 2015-02-12 |
| |\ \ | |||
| * | | | update EIGEN_FAST_MATH documentation | Gael Guennebaud | 2015-02-12 |
|/ / / | |||
| * | | Marked a few functions as EIGEN_DEVICE_FUNC to enable the use of tensors in ↵ | Benoit Steiner | 2015-02-10 |
| | | | | | | | | | | | | cuda kernels. | ||
* | | | merge | Gael Guennebaud | 2015-02-10 |
|\ \ \ | |||
* | | | | FMA has been wrongly disabled | Gael Guennebaud | 2015-02-10 |
| | | | | |||
| * | | | Added vectorized implementation of the exponential function for ARM/NEON | Benoit Steiner | 2015-02-10 |
|/ / / | |||
* | | | Make Block<SparseMatrix> inherit SparseCompressedBase in the case of an ↵ | Gael Guennebaud | 2015-02-09 |
| | | | | | | | | | | | | inner-panels and fix valuePtr() innerIndexPtr() | ||
* | | | Add a SparseCompressedBase class providing (un)compressed accessors (like ↵ | Gael Guennebaud | 2015-02-07 |
| | | | | | | | | | | | | | | | | | | data()/*Stride() for dense matrices), and a CompressedAccessBit flag (similar to DirectAccessBit for dense matrices). | ||
| * | | Pulled latest fixes | Benoit Steiner | 2015-02-06 |
| |\ \ | |||
| | * \ | merge | Gael Guennebaud | 2015-02-06 |
| | |\ \ | |||
| | * | | | Fix symmetric product | Gael Guennebaud | 2015-02-06 |
| | | | | | |||
| * | | | | Pulled the latest changes from the trunk | Benoit Steiner | 2015-02-06 |
| |\ \ \ \ | |/ / / / |/| | / / | | |/ / | |/| | | |||
| * | | | Added the EIGEN_HAS_CONSTEXPR define | Benoit Steiner | 2015-02-06 |
| |/ / | | | | | | | | | | Gate the tensor index list code based on the value of EIGEN_HAS_CONSTEXPR | ||
* | | | bug #936, patch 3/3: Properly detect FMA support on ARM (requires VFPv4) | Benoit Jacob | 2015-01-30 |
| | | | | | | | | | | | | | | | and use it instead of MLA when available, because it's both more accurate, and faster. | ||
* | | | bug #936, patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with ↵ | Benoit Jacob | 2015-01-30 |
| | | | | | | | | | | | | EIGEN_HAS_SINGLE_INSTRUCTION_MADD | ||
* | | | bug #936, patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_, | Benoit Jacob | 2015-01-31 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | because this is what they are about. "Fused" means "no intermediate rounding between the mul and the add, only one rounding at the end". Instead, what we are concerned about here is whether a temporary register is needed, i.e. whether the MUL and ADD are separate instructions. Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA. But a true fused mul-add is only available on VFPv4: VFMA. | ||
* | | | bug #936, patch 1/3: some cleanup and renaming for consistency. | Benoit Jacob | 2015-01-30 |
| | | | |||
* | | | bug #935: Add asm comments in GEBP kernels to work around a bug | Benoit Jacob | 2015-01-30 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | in both GCC and Clang on ARM/NEON, whereby they spill registers, severely harming performance. The reason why the asm comments make a difference is that they prevent the compiler from reordering code across these boundaries, which has the effect of extending the lifetime of local variables and increasing register pressure on this register-tight code. | ||
* | | | Enable vectorization of transposeInPlace for PacketSize x PacketSize matrices | Gael Guennebaud | 2015-01-26 |
| | | | |||
* | | | Add support for dense ?= diagonal | Gael Guennebaud | 2015-01-24 |
| | | | |||
* | | | Fix missing evaluator in outer-product | Gael Guennebaud | 2015-01-13 |
| | | | |||
* | | | bug #907, ARM64: workaround ICE in xcode/clang | Gael Guennebaud | 2015-01-13 |
| | | | |||
* | | | bug #907, ARM64: workaround vreinterpretq_u64_* not defined in xcode/clang | Gael Guennebaud | 2015-01-13 |
| | | | |||
* | | | Big 907: workaround some missing intrinsics in current NDK's gcc version (ARM64) | Gael Guennebaud | 2015-01-07 |
| | | | |||
* | | | bug #907: fix compilation with ARM64 | Gael Guennebaud | 2015-01-07 |
| | | | |||
| * | | Ensured that contractions that can be reduced to a matrix vector product ↵ | Benoit Steiner | 2015-01-06 |
| | | | | | | | | | | | | work correctly even when the input coefficients aren't aligned. | ||
* | | | bug #921: fix utilization of bitwise operation on enums in first_aligned | Gael Guennebaud | 2014-12-19 |
| | | |