Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Optimized SSE unaligned loads and stores when compiling a 64bit target with ↵ | 2014-04-14 | |
| | | | | a recent version of gcc (ie gcc 4.8). | ||
* | Workaround alignment warnings | 2014-03-30 | |
| | |||
* | Add a mechanism to recursively access to half-size packet types | 2014-03-28 | |
| | |||
* | Merged latest changes from parent. | 2014-03-27 | |
|\ | |||
* | | Implemented the SSE version of the gather and scatter packet primitives. | 2014-03-27 | |
| | | |||
* | | Implemented the AVX version of the gather and scatter packet primitives. | 2014-03-27 | |
| | | |||
| * | enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates ↵ | 2014-03-27 | |
|/ | | | | the other fmadd variants plus some register moves...) | ||
* | Implemented the AVX version of the ptranspose packet primitive. | 2014-03-27 | |
| | |||
* | Implement pcplflip, palign, predux and the likes from AVC/complexes | 2014-03-27 | |
| | |||
* | Created the ptranspose packet primitive that can transpose an array of N ↵ | 2014-03-26 | |
| | | | | | | packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions. Implemented the primitive using SSE instructions. | ||
* | Specialized the pload1 packet primitive for Packet8f and Packet4d in order ↵ | 2014-03-26 | |
| | | | | to take advantage of the vbroadcastss and vbroadcastsd instructions whenever possible. | ||
* | Merged latest updates from the parent branch | 2014-03-26 | |
|\ | |||
* | | Vectorized the multiplication and division of complex numbers using AVX ↵ | 2014-03-26 | |
| | | | | | | | | instructions. | ||
* | | Used AVX instructions to vectorize the complex version of the pfirst and ↵ | 2014-03-26 | |
| | | | | | | | | | | | | ploaddup packet primitives. Silenced a few compilation warnings. | ||
| * | Implement new 1 packet x 8 gebp kernel | 2014-03-26 | |
| | | |||
| * | add pbroadcast2/4 generic intrinsics | 2014-03-26 | |
| | | |||
* | | Use AVX instructions to vectorize pset1<Packet2cd>, pset1<Packet4cf>, ↵ | 2014-03-25 | |
| | | | | | | | | preverse<Packet2cd>, and preverse<Packet4cf> | ||
* | | Used AVX instructions to vectorize the predux_min<Packet8f>, ↵ | 2014-03-24 | |
| | | | | | | | | predux_min<Packet4d>, predux_max<Packet8f>, and predux_max<Packet4d> packet primitives. | ||
* | | Added support for FMA instructions | 2014-02-24 | |
| | | |||
* | | Added support for AVX to Eigen. | 2014-01-29 | |
| | | |||
| * | Revert previous change and introduce a new workaround regarding gcc ↵ | 2014-03-20 | |
| | | | | | | | | | | | | | | generating a shufps instruction instead of the more efficient pshufd instruction. The trick consists in introducing a new pload1 function to be used in low level product kernels for which bug #203 does not apply. Indeed, it turned out that using inline assembly prevents gcc of doing a good job at instructtion reordering. | ||
| * | Makes gcc to generate a pshufd instruction for pset1 | 2014-03-20 | |
|/ | |||
* | Remove useless register keyword, and optimize predux_min/max for SSE4 | 2014-01-25 | |
| | |||
* | bug #677: fix usage of pld instrinsics for ccomplexes | 2013-11-02 | |
| | |||
* | Fix bug #677: compilation issue on arm64 which does not have the PLD instruction | 2013-10-31 | |
| | |||
* | fix a few "dead stores" warnings | 2013-10-26 | |
| | |||
* | Fix ploaddup and lin-spaced with AltiVec. | 2013-09-10 | |
| | |||
* | typo | 2013-08-19 | |
| | |||
* | Fix bug #642: add vectorization of sqrt for doubles, and make sqrt really ↵ | 2013-08-19 | |
| | | | | safe if EIGEN_FAST_MATH is disabled | ||
* | Fix bug #590: NEON Duplicate lane load | 2013-06-23 | |
| | |||
* | Make psqrt works with numeric_limits<float>::min | 2013-06-14 | |
| | |||
* | Fix bug #613: psqrt was incorrect for small numbers | 2013-06-13 | |
| | |||
* | Fix bug #314: move remaining math functions from internal to numext namespace | 2013-06-10 | |
| | |||
* | Fix bug #591: minor optimization in NEON vectorization support | 2013-06-10 | |
| | |||
* | Add missing pconj specializations | 2013-05-17 | |
| | |||
* | Add SSE4 min/max for integers | 2013-03-20 | |
| | |||
* | Fix SSE plog<float> to return -INF on 0 | 2013-02-14 | |
| | |||
* | Suppress annoying "may be used uninitialized in this function" warning with ↵ | 2013-01-24 | |
| | | | | gcc >= 4.6 | ||
* | fix warning | 2012-08-01 | |
| | |||
* | fix lower acceptable bound of SSE pexp for double | 2012-07-31 | |
| | |||
* | add SSE pexp function for double, make use of _mm_floor_p* for pexp with SSE4.1 | 2012-07-27 | |
| | |||
* | Automatic relicensing to MPL2 using Keirs script. Manual fixup follows. | 2012-07-13 | |
| | |||
* | fix typo | 2012-07-04 | |
| | |||
* | fix NEON port, use vget_lane_*() instead of temporary variables (saves extra | 2012-07-04 | |
| | | | | | load/store), following advice by Josh Bleecher Snyder <josharian@gmail.com>. Also implement pmadd() using vmla instead of nested padd/pmul. | ||
* | fix bug #475: .exp() now returns +inf when overflow occurs (SSE) | 2012-06-14 | |
| | |||
* | ARM NEON supports multiply-accumulate instruction vmla, use that in pmadd(). | 2012-05-28 | |
| | |||
* | Get rid of include directives inside namespace blocks (bug #339). | 2012-04-15 | |
| | |||
* | proper C++ casting | 2012-01-31 | |
| | |||
* | fix static inline versus inline static issues (the former is the correct order) | 2012-01-31 | |
| | |||
* | Patches to support ARM NEON with Clang 3.0 and LLVM-GCC | 2011-11-04 | |
| |