aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core
Commit message (Collapse)AuthorAge
* bug #955 - Implement a rotating kernel alternative in the 3px4 gebp pathGravatar Benoit Jacob2015-02-18
| | | | | | | | This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).
* Fixed template parameter.Gravatar Hauke Heibel2015-02-18
|
* mergeGravatar Gael Guennebaud2015-02-18
|\
* | Clean a bit computeProductBlockingSizes (use Index type, remove CEIL macro)Gravatar Gael Guennebaud2015-02-18
| |
| * bug #958 - Allow testing specific blocking sizesGravatar Benoit Jacob2015-02-18
|/ | | | | | | | | | | | | | This is only a debugging/testing patch. It allows testing specific product blocking sizes, typically to study the impact on performance. Example usage: int testk, testm, testn; #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZES #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_K testk #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_M testm #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_N testn #include <Eigen/Core>
* Fix a regression when using OpenMP, and fix bug #714: the number of threads ↵Gravatar Gael Guennebaud2015-02-18
| | | | might be lower than the number of requested ones
* Fix bug #945: workaround MSVC warningGravatar Gael Guennebaud2015-02-18
|
* Add missing install directives for arch/CUDAGravatar Gael Guennebaud2015-02-18
|
* Remove some dead stores.Gravatar Gael Guennebaud2015-02-18
|
* Packet must be passed by const reference and not by value to avoid alignment ↵Gravatar Gael Guennebaud2015-02-17
| | | | issue.
* Disable __m128* wrappers when compiling with AVX and -fabi-version=4Gravatar Gael Guennebaud2015-02-17
|
* Fix compilation with GCC/AVX (workaround __m128 and __m256 being the same ↵Gravatar Gael Guennebaud2015-02-17
| | | | type with default ABI)
* Add PermutationMatrix::determinant method.Gravatar Gael Guennebaud2015-02-16
|
* bug #956: Fixed bug in move constructors of DenseStorage which caused ↵Gravatar Martin Drozdik2015-02-16
| | | | "moved-from" objects to be in an invalid state.
* Merged in chtz/eigen-indexconversion (pull request PR-92)Gravatar Gael Guennebaud2015-02-16
|\ | | | | | | | | | | | | | | | | | | | | | | bug #877, bug #572: Get rid of Index conversion warnings, summary of changes: - Introduce a global typedef Eigen::Index making Eigen::DenseIndex and AnyExpr<>::Index deprecated (default is std::ptrdiff_t). - Eigen::Index is used throughout the API to represent indices, offsets, and sizes. - Classes storing an array of indices uses the type StorageIndex to store them. This is a template parameter of the class. Default is int. - Methods that *explicitly* set or return an element of such an array take or return a StorageIndex type. In all other cases, the Index type is used.
| * The usage of DenseIndex is deprecated, so let's replace DenseIndex by IndexGravatar Gael Guennebaud2015-02-16
| |
| * Remove deprecated usage of expr::Index.Gravatar Gael Guennebaud2015-02-16
| |
| * Fix many long to int conversion warnings:Gravatar Gael Guennebaud2015-02-16
| | | | | | | | | | | | - fix usage of Index (API) versus StorageIndex (when multiple indexes are stored) - use StorageIndex(val) when the input has already been check - use internal::convert_index<StorageIndex>(val) when val is potentially unsafe (directly comes from user input)
* | Pulled latest updates from trunkGravatar Benoit Steiner2015-02-13
|\ \
* | | Optimized version of the sin(), exp(), log() and sqrt() function for AVXGravatar Benoit Steiner2015-02-13
| | |
| * | bug #953 - Fix prefetches in 3px4 product kernelGravatar Benoit Jacob2015-02-13
| | | | | | | | | | | | This gives a 10% speedup on nexus 4 and on nexus 5.
| | * Index refactoring: StorageIndex must be used for storage only (and locally ↵Gravatar Gael Guennebaud2015-02-13
| | | | | | | | | | | | when it make sense). In all other cases use the global Index type.
| | * Merge Index-refactoring branch with default, fix PastixSupport, remove some ↵Gravatar Gael Guennebaud2015-02-13
| | |\ | | |/ | |/| | | | useless typedefs
| * | merge Tensor module within Eigen/unsupported and update gemv BLAS wrapperGravatar Gael Guennebaud2015-02-12
| |\ \
| * | | update EIGEN_FAST_MATH documentationGravatar Gael Guennebaud2015-02-12
|/ / /
| * | Marked a few functions as EIGEN_DEVICE_FUNC to enable the use of tensors in ↵Gravatar Benoit Steiner2015-02-10
| | | | | | | | | | | | cuda kernels.
* | | mergeGravatar Gael Guennebaud2015-02-10
|\ \ \
* | | | FMA has been wrongly disabledGravatar Gael Guennebaud2015-02-10
| | | |
| * | | Added vectorized implementation of the exponential function for ARM/NEONGravatar Benoit Steiner2015-02-10
|/ / /
* | | Make Block<SparseMatrix> inherit SparseCompressedBase in the case of an ↵Gravatar Gael Guennebaud2015-02-09
| | | | | | | | | | | | inner-panels and fix valuePtr() innerIndexPtr()
* | | Add a SparseCompressedBase class providing (un)compressed accessors (like ↵Gravatar Gael Guennebaud2015-02-07
| | | | | | | | | | | | | | | | | | data()/*Stride() for dense matrices), and a CompressedAccessBit flag (similar to DirectAccessBit for dense matrices).
| * | Pulled latest fixesGravatar Benoit Steiner2015-02-06
| |\ \
| | * \ mergeGravatar Gael Guennebaud2015-02-06
| | |\ \
| | * | | Fix symmetric productGravatar Gael Guennebaud2015-02-06
| | | | |
| * | | | Pulled the latest changes from the trunkGravatar Benoit Steiner2015-02-06
| |\ \ \ \ | |/ / / / |/| | / / | | |/ / | |/| |
| * | | Added the EIGEN_HAS_CONSTEXPR defineGravatar Benoit Steiner2015-02-06
| |/ / | | | | | | | | | Gate the tensor index list code based on the value of EIGEN_HAS_CONSTEXPR
* | | bug #936, patch 3/3: Properly detect FMA support on ARM (requires VFPv4)Gravatar Benoit Jacob2015-01-30
| | | | | | | | | | | | | | | and use it instead of MLA when available, because it's both more accurate, and faster.
* | | bug #936, patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with ↵Gravatar Benoit Jacob2015-01-30
| | | | | | | | | | | | EIGEN_HAS_SINGLE_INSTRUCTION_MADD
* | | bug #936, patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_,Gravatar Benoit Jacob2015-01-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | because this is what they are about. "Fused" means "no intermediate rounding between the mul and the add, only one rounding at the end". Instead, what we are concerned about here is whether a temporary register is needed, i.e. whether the MUL and ADD are separate instructions. Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA. But a true fused mul-add is only available on VFPv4: VFMA.
* | | bug #936, patch 1/3: some cleanup and renaming for consistency.Gravatar Benoit Jacob2015-01-30
| | |
* | | bug #935: Add asm comments in GEBP kernels to work around a bugGravatar Benoit Jacob2015-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | in both GCC and Clang on ARM/NEON, whereby they spill registers, severely harming performance. The reason why the asm comments make a difference is that they prevent the compiler from reordering code across these boundaries, which has the effect of extending the lifetime of local variables and increasing register pressure on this register-tight code.
* | | Enable vectorization of transposeInPlace for PacketSize x PacketSize matricesGravatar Gael Guennebaud2015-01-26
| | |
* | | Add support for dense ?= diagonalGravatar Gael Guennebaud2015-01-24
| | |
* | | Fix missing evaluator in outer-productGravatar Gael Guennebaud2015-01-13
| | |
* | | bug #907, ARM64: workaround ICE in xcode/clangGravatar Gael Guennebaud2015-01-13
| | |
* | | bug #907, ARM64: workaround vreinterpretq_u64_* not defined in xcode/clangGravatar Gael Guennebaud2015-01-13
| | |
* | | Big 907: workaround some missing intrinsics in current NDK's gcc version (ARM64)Gravatar Gael Guennebaud2015-01-07
| | |
* | | bug #907: fix compilation with ARM64Gravatar Gael Guennebaud2015-01-07
| | |
| * | Ensured that contractions that can be reduced to a matrix vector product ↵Gravatar Benoit Steiner2015-01-06
| | | | | | | | | | | | work correctly even when the input coefficients aren't aligned.
* | | bug #921: fix utilization of bitwise operation on enums in first_alignedGravatar Gael Guennebaud2014-12-19
| | |