Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | bug #1103: fix neon vectorization of pmul(Packet1cd,Packet1cd) | 2015-12-10 | |
| | |||
* | Fix prototype of plset and generalize linspace functor. | 2015-08-07 | |
| | |||
* | Let unpacket_traits<> exposes the required alignment and make use of it ↵ | 2015-08-07 | |
| | | | | everywhere | ||
* | Abandon blocking size lookup table approach. Not performing as well in real ↵ | 2015-05-19 | |
| | | | | world as in microbenchmark. | ||
* | also uninitialized here, see previous cset | 2015-05-15 | |
| | |||
* | Fix uninitialized var warning. The compiler was clearing the register ↵ | 2015-05-15 | |
| | | | | anyway, so this does not change resulting code | ||
* | use unsigned short instead of uint16_t which doesn't exist in c++98 | 2015-03-17 | |
| | |||
* | Update Nexus 5 lookup table from combining now 2 runs of the benchmark, ↵ | 2015-03-16 | |
| | | | | using the analyze-blocking-sizes partition tool. Gives better worst-case performance. | ||
* | Provide a empirical lookup table for blocking sizes measured on a Nexus 5. ↵ | 2015-03-15 | |
| | | | | Only for float, only for Android on ARM 32bit for now. | ||
* | must also disable complex<double> when disabling double vectorization | 2015-03-03 | |
| | |||
* | Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON ↵ | 2015-03-03 | |
| | | | | intrinsics. | ||
* | HalfPacket also needed to be disabled for double, on ARMv8. | 2015-03-02 | |
| | |||
* | remove trailing comma | 2015-02-27 | |
| | |||
* | Disable Packet2f/2i halfpacket support in NEON. | 2015-02-27 | |
| | | | | | | I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match. | ||
* | bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path | 2015-02-18 | |
| | | | | | | | | This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower). | ||
* | The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index | 2015-02-16 | |
| | |||
* | Added vectorized implementation of the exponential function for ARM/NEON | 2015-02-10 | |
| | |||
* | bug #936, patch 3/3: Properly detect FMA support on ARM (requires VFPv4) | 2015-01-30 | |
| | | | | | and use it instead of MLA when available, because it's both more accurate, and faster. | ||
* | bug #936, patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with ↵ | 2015-01-30 | |
| | | | | EIGEN_HAS_SINGLE_INSTRUCTION_MADD | ||
* | bug #936, patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_, | 2015-01-31 | |
| | | | | | | | | | because this is what they are about. "Fused" means "no intermediate rounding between the mul and the add, only one rounding at the end". Instead, what we are concerned about here is whether a temporary register is needed, i.e. whether the MUL and ADD are separate instructions. Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA. But a true fused mul-add is only available on VFPv4: VFMA. | ||
* | bug #936, patch 1/3: some cleanup and renaming for consistency. | 2015-01-30 | |
| | |||
* | bug #907, ARM64: workaround ICE in xcode/clang | 2015-01-13 | |
| | |||
* | bug #907, ARM64: workaround vreinterpretq_u64_* not defined in xcode/clang | 2015-01-13 | |
| | |||
* | Big 907: workaround some missing intrinsics in current NDK's gcc version (ARM64) | 2015-01-07 | |
| | |||
* | bug #907: fix compilation with ARM64 | 2015-01-07 | |
| | |||
* | Introduce unified macros to identify compiler, OS, and architecture. They ↵ | 2014-11-04 | |
| | | | | are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively. | ||
* | Added ARMv8 support | 2014-10-22 | |
| | |||
* | working 64-bit support in PacketMath.h, Complex.h needed | 2014-10-21 | |
| | |||
* | Replace asm by __asm__ (bug #873) | 2014-09-06 | |
| | |||
* | bug #871: fix compilation on ARM/Neon regarding __has_builtin usage | 2014-09-01 | |
| | |||
* | Fix many long to int implicit conversions | 2014-07-08 | |
| | |||
* | Fix ptranspose overload prototypes for NEON | 2014-04-25 | |
| | |||
* | Enable vectorization of pack_rhs with a column-major RHS. | 2014-04-25 | |
| | | | | Rename and generalize Kernel<*> to PacketBlock<*,N>. | ||
* | Fixed the NEON implementation of predux_max<Packet4i>. | 2014-04-23 | |
| | |||
* | Created a NEON version of the ptranspose packet primitives | 2014-04-23 | |
| | |||
* | merge with default branch | 2014-04-22 | |
|\ | |||
* | | Implemented the pgather/pscatter packet primitives for the arm/NEON architecture | 2014-04-17 | |
| | | |||
* | | New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge ↵ | 2014-04-16 | |
| | | | | | | | | | | | | speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4. | ||
| * | bug #782: Workaround for gcc <= 4.4 compilation error on the NEON PacketMath ↵ | 2014-04-03 | |
| | | | | | | | | code. | ||
* | | Add a mechanism to recursively access to half-size packet types | 2014-03-28 | |
|/ | |||
* | bug #677: fix usage of pld instrinsics for ccomplexes | 2013-11-02 | |
| | |||
* | Fix bug #677: compilation issue on arm64 which does not have the PLD instruction | 2013-10-31 | |
| | |||
* | Fix bug #590: NEON Duplicate lane load | 2013-06-23 | |
| | |||
* | Fix bug #591: minor optimization in NEON vectorization support | 2013-06-10 | |
| | |||
* | Add missing pconj specializations | 2013-05-17 | |
| | |||
* | Automatic relicensing to MPL2 using Keirs script. Manual fixup follows. | 2012-07-13 | |
| | |||
* | fix typo | 2012-07-04 | |
| | |||
* | fix NEON port, use vget_lane_*() instead of temporary variables (saves extra | 2012-07-04 | |
| | | | | | load/store), following advice by Josh Bleecher Snyder <josharian@gmail.com>. Also implement pmadd() using vmla instead of nested padd/pmul. | ||
* | ARM NEON supports multiply-accumulate instruction vmla, use that in pmadd(). | 2012-05-28 | |
| | |||
* | Get rid of include directives inside namespace blocks (bug #339). | 2012-04-15 | |
| |