aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core/arch
Commit message (Collapse)AuthorAge
* Optimized SSE unaligned loads and stores when compiling a 64bit target with ↵Gravatar Benoit Steiner2014-04-14
| | | | a recent version of gcc (ie gcc 4.8).
* Workaround alignment warningsGravatar Gael Guennebaud2014-03-30
|
* Add a mechanism to recursively access to half-size packet typesGravatar Gael Guennebaud2014-03-28
|
* Merged latest changes from parent.Gravatar Benoit Steiner2014-03-27
|\
* | Implemented the SSE version of the gather and scatter packet primitives.Gravatar Benoit Steiner2014-03-27
| |
* | Implemented the AVX version of the gather and scatter packet primitives.Gravatar Benoit Steiner2014-03-27
| |
| * enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates ↵Gravatar Gael Guennebaud2014-03-27
|/ | | | the other fmadd variants plus some register moves...)
* Implemented the AVX version of the ptranspose packet primitive.Gravatar Benoit Steiner2014-03-27
|
* Implement pcplflip, palign, predux and the likes from AVC/complexesGravatar Gael Guennebaud2014-03-27
|
* Created the ptranspose packet primitive that can transpose an array of N ↵Gravatar Benoit Steiner2014-03-26
| | | | | | packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions. Implemented the primitive using SSE instructions.
* Specialized the pload1 packet primitive for Packet8f and Packet4d in order ↵Gravatar Benoit Steiner2014-03-26
| | | | to take advantage of the vbroadcastss and vbroadcastsd instructions whenever possible.
* Merged latest updates from the parent branchGravatar Benoit Steiner2014-03-26
|\
* | Vectorized the multiplication and division of complex numbers using AVX ↵Gravatar Benoit Steiner2014-03-26
| | | | | | | | instructions.
* | Used AVX instructions to vectorize the complex version of the pfirst and ↵Gravatar Benoit Steiner2014-03-26
| | | | | | | | | | | | ploaddup packet primitives. Silenced a few compilation warnings.
| * Implement new 1 packet x 8 gebp kernelGravatar Gael Guennebaud2014-03-26
| |
| * add pbroadcast2/4 generic intrinsicsGravatar Gael Guennebaud2014-03-26
| |
* | Use AVX instructions to vectorize pset1<Packet2cd>, pset1<Packet4cf>, ↵Gravatar Benoit Steiner2014-03-25
| | | | | | | | preverse<Packet2cd>, and preverse<Packet4cf>
* | Used AVX instructions to vectorize the predux_min<Packet8f>, ↵Gravatar Benoit Steiner2014-03-24
| | | | | | | | predux_min<Packet4d>, predux_max<Packet8f>, and predux_max<Packet4d> packet primitives.
* | Added support for FMA instructionsGravatar Benoit Steiner2014-02-24
| |
* | Added support for AVX to Eigen.Gravatar Benoit Steiner2014-01-29
| |
| * Revert previous change and introduce a new workaround regarding gcc ↵Gravatar Gael Guennebaud2014-03-20
| | | | | | | | | | | | | | generating a shufps instruction instead of the more efficient pshufd instruction. The trick consists in introducing a new pload1 function to be used in low level product kernels for which bug #203 does not apply. Indeed, it turned out that using inline assembly prevents gcc of doing a good job at instructtion reordering.
| * Makes gcc to generate a pshufd instruction for pset1Gravatar Gael Guennebaud2014-03-20
|/
* Remove useless register keyword, and optimize predux_min/max for SSE4Gravatar Gael Guennebaud2014-01-25
|
* bug #677: fix usage of pld instrinsics for ccomplexesGravatar Gael Guennebaud2013-11-02
|
* Fix bug #677: compilation issue on arm64 which does not have the PLD instructionGravatar Gael Guennebaud2013-10-31
|
* fix a few "dead stores" warningsGravatar Gael Guennebaud2013-10-26
|
* Fix ploaddup and lin-spaced with AltiVec.Gravatar Gael Guennebaud2013-09-10
|
* typoGravatar Gael Guennebaud2013-08-19
|
* Fix bug #642: add vectorization of sqrt for doubles, and make sqrt really ↵Gravatar Gael Guennebaud2013-08-19
| | | | safe if EIGEN_FAST_MATH is disabled
* Fix bug #590: NEON Duplicate lane loadGravatar Simon Pilgrim2013-06-23
|
* Make psqrt works with numeric_limits<float>::minGravatar Gael Guennebaud2013-06-14
|
* Fix bug #613: psqrt was incorrect for small numbersGravatar Jeff Dean2013-06-13
|
* Fix bug #314: move remaining math functions from internal to numext namespaceGravatar Gael Guennebaud2013-06-10
|
* Fix bug #591: minor optimization in NEON vectorization supportGravatar Simon Pilgrim2013-06-10
|
* Add missing pconj specializationsGravatar Gael Guennebaud2013-05-17
|
* Add SSE4 min/max for integersGravatar Gael Guennebaud2013-03-20
|
* Fix SSE plog<float> to return -INF on 0Gravatar Gael Guennebaud2013-02-14
|
* Suppress annoying "may be used uninitialized in this function" warning with ↵Gravatar Gael Guennebaud2013-01-24
| | | | gcc >= 4.6
* fix warningGravatar Gael Guennebaud2012-08-01
|
* fix lower acceptable bound of SSE pexp for doubleGravatar Gael Guennebaud2012-07-31
|
* add SSE pexp function for double, make use of _mm_floor_p* for pexp with SSE4.1Gravatar Gael Guennebaud2012-07-27
|
* Automatic relicensing to MPL2 using Keirs script. Manual fixup follows.Gravatar Benoit Jacob2012-07-13
|
* fix typoGravatar Konstantinos Margaritis2012-07-04
|
* fix NEON port, use vget_lane_*() instead of temporary variables (saves extraGravatar Konstantinos Margaritis2012-07-04
| | | | | load/store), following advice by Josh Bleecher Snyder <josharian@gmail.com>. Also implement pmadd() using vmla instead of nested padd/pmul.
* fix bug #475: .exp() now returns +inf when overflow occurs (SSE)Gravatar Gael Guennebaud2012-06-14
|
* ARM NEON supports multiply-accumulate instruction vmla, use that in pmadd().Gravatar kmargar2012-05-28
|
* Get rid of include directives inside namespace blocks (bug #339).Gravatar Jitse Niesen2012-04-15
|
* proper C++ castingGravatar Gael Guennebaud2012-01-31
|
* fix static inline versus inline static issues (the former is the correct order)Gravatar Gael Guennebaud2012-01-31
|
* Patches to support ARM NEON with Clang 3.0 and LLVM-GCCGravatar Marton Danoczy2011-11-04
|