Commit message (Collapse) | Author | Age | ||
---|---|---|---|---|
... | ||||
* | HalfPacket also needed to be disabled for double, on ARMv8. | Benoit Jacob | 2015-03-02 | |
| | ||||
* | remove trailing comma | Benoit Jacob | 2015-02-27 | |
| | ||||
* | Disable Packet2f/2i halfpacket support in NEON. | Benoit Jacob | 2015-02-27 | |
| | | | | | | I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match. | |||
* | bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path | Benoit Jacob | 2015-02-18 | |
| | | | | | | | | This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower). | |||
* | The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index | Gael Guennebaud | 2015-02-16 | |
| | ||||
* | Added vectorized implementation of the exponential function for ARM/NEON | Benoit Steiner | 2015-02-10 | |
| | ||||
* | bug #936, patch 3/3: Properly detect FMA support on ARM (requires VFPv4) | Benoit Jacob | 2015-01-30 | |
| | | | | | and use it instead of MLA when available, because it's both more accurate, and faster. | |||
* | bug #936, patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with ↵ | Benoit Jacob | 2015-01-30 | |
| | | | | EIGEN_HAS_SINGLE_INSTRUCTION_MADD | |||
* | bug #936, patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_, | Benoit Jacob | 2015-01-31 | |
| | | | | | | | | | because this is what they are about. "Fused" means "no intermediate rounding between the mul and the add, only one rounding at the end". Instead, what we are concerned about here is whether a temporary register is needed, i.e. whether the MUL and ADD are separate instructions. Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA. But a true fused mul-add is only available on VFPv4: VFMA. | |||
* | bug #936, patch 1/3: some cleanup and renaming for consistency. | Benoit Jacob | 2015-01-30 | |
| | ||||
* | bug #907, ARM64: workaround ICE in xcode/clang | Gael Guennebaud | 2015-01-13 | |
| | ||||
* | bug #907, ARM64: workaround vreinterpretq_u64_* not defined in xcode/clang | Gael Guennebaud | 2015-01-13 | |
| | ||||
* | Big 907: workaround some missing intrinsics in current NDK's gcc version (ARM64) | Gael Guennebaud | 2015-01-07 | |
| | ||||
* | bug #907: fix compilation with ARM64 | Gael Guennebaud | 2015-01-07 | |
| | ||||
* | Introduce unified macros to identify compiler, OS, and architecture. They ↵ | Gael Guennebaud | 2014-11-04 | |
| | | | | are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively. | |||
* | Added ARMv8 support | Konstantinos Margaritis | 2014-10-22 | |
| | ||||
* | working 64-bit support in PacketMath.h, Complex.h needed | Konstantinos Margaritis | 2014-10-21 | |
| | ||||
* | Replace asm by __asm__ (bug #873) | Jitse Niesen | 2014-09-06 | |
| | ||||
* | bug #871: fix compilation on ARM/Neon regarding __has_builtin usage | Gael Guennebaud | 2014-09-01 | |
| | ||||
* | Fix many long to int implicit conversions | Gael Guennebaud | 2014-07-08 | |
| | ||||
* | Fix ptranspose overload prototypes for NEON | Gael Guennebaud | 2014-04-25 | |
| | ||||
* | Enable vectorization of pack_rhs with a column-major RHS. | Gael Guennebaud | 2014-04-25 | |
| | | | | Rename and generalize Kernel<*> to PacketBlock<*,N>. | |||
* | Fixed the NEON implementation of predux_max<Packet4i>. | Benoit Steiner | 2014-04-23 | |
| | ||||
* | Created a NEON version of the ptranspose packet primitives | Benoit Steiner | 2014-04-23 | |
| | ||||
* | merge with default branch | Gael Guennebaud | 2014-04-22 | |
|\ | ||||
* | | Implemented the pgather/pscatter packet primitives for the arm/NEON architecture | Benoit Steiner | 2014-04-17 | |
| | | ||||
* | | New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge ↵ | Gael Guennebaud | 2014-04-16 | |
| | | | | | | | | | | | | speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4. | |||
| * | bug #782: Workaround for gcc <= 4.4 compilation error on the NEON PacketMath ↵ | Benoit Steiner | 2014-04-03 | |
| | | | | | | | | code. | |||
* | | Add a mechanism to recursively access to half-size packet types | Gael Guennebaud | 2014-03-28 | |
|/ | ||||
* | Fix bug #677: compilation issue on arm64 which does not have the PLD instruction | Gael Guennebaud | 2013-10-31 | |
| | ||||
* | Fix bug #590: NEON Duplicate lane load | Simon Pilgrim | 2013-06-23 | |
| | ||||
* | Add missing pconj specializations | Gael Guennebaud | 2013-05-17 | |
| | ||||
* | Automatic relicensing to MPL2 using Keirs script. Manual fixup follows. | Benoit Jacob | 2012-07-13 | |
| | ||||
* | fix typo | Konstantinos Margaritis | 2012-07-04 | |
| | ||||
* | fix NEON port, use vget_lane_*() instead of temporary variables (saves extra | Konstantinos Margaritis | 2012-07-04 | |
| | | | | | load/store), following advice by Josh Bleecher Snyder <josharian@gmail.com>. Also implement pmadd() using vmla instead of nested padd/pmul. | |||
* | ARM NEON supports multiply-accumulate instruction vmla, use that in pmadd(). | kmargar | 2012-05-28 | |
| | ||||
* | Get rid of include directives inside namespace blocks (bug #339). | Jitse Niesen | 2012-04-15 | |
| | ||||
* | Patches to support ARM NEON with Clang 3.0 and LLVM-GCC | Marton Danoczy | 2011-11-04 | |
| | ||||
* | NEON: fix plset | Gael Guennebaud | 2011-05-18 | |
| | ||||
* | NEON: fix ploaddup | Gael Guennebaud | 2011-05-18 | |
| | ||||
* | gcc 4.4 also defines float32_t as a special type | Gael Guennebaud | 2011-02-22 | |
| | ||||
* | workaround gcc 4.2 and 4.3 compilation issue with NEON | Gael Guennebaud | 2011-02-07 | |
| | ||||
* | Remove all references to EIGEN_TUNE_CPU_CACHE_SIZE. | Jitse Niesen | 2011-02-04 | |
| | | | | | This macro is no longer used as of revision 0212eec23f4cb64e8426bf32568156df302f8fcf . | |||
* | Fixed NEON compilation errors, changed float-abi back to softfp (which is ↵ | Konstantinos Margaritis | 2010-12-10 | |
| | | | | | | the most used right now). Some complex tests appear to segfault, needs a more careful look. | |||
* | bug #86 : use internal:: namespace instead of ei_ prefix | Benoit Jacob | 2010-10-25 | |
| | ||||
* | add NEON ploaddup and pcplxflip functions | Gael Guennebaud | 2010-07-20 | |
| | ||||
* | mixing types in product step 2: | Gael Guennebaud | 2010-07-11 | |
| | | | | | | | | * pload* and pset1 are now templated on the packet type * gemv routines are now embeded into a structure with a consistent API with respect to gemm * some configurations of vector * matrix and matrix * matrix works fine, some need more work... | |||
* | sync | Gael Guennebaud | 2010-07-10 | |
|\ | ||||
| * | Added NEON/Complex.h, ~3.5x faster than scalar std::complex<float> | Konstantinos Margaritis | 2010-07-10 | |
| | | | | | | | | minor fix in AltiVec Complex.h | |||
* | | scalars fitting in a single packet requires more work, step 1 | Gael Guennebaud | 2010-07-08 | |
|/ | | | | | * add a, Alignable trait * update LinearVectorization assignment |