aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core/arch
Commit message (Collapse)AuthorAge
* bug #1085: workaround gcc default ABI issueGravatar Gael Guennebaud2015-10-10
|
* _mm_hadd_epi32 is for SSSE3 only (and not SSE3)Gravatar Gael Guennebaud2015-10-07
|
* Handle various TODOs in SSE vectorization (remove splitted storeu, enable ↵Gravatar Gael Guennebaud2015-10-06
| | | | SSE3 integer vectorization, plus minor tweaks)
* bug #1069: fix AVX support on MSVC (use of non portable C-style cast)Gravatar Gael Guennebaud2015-09-28
|
* Added support for predux_mul for CUDA devicesGravatar Benoit Steiner2015-09-08
|
* Implement plog and pexp for AltiVec.Gravatar Doug Kwan2015-07-30
|
* Fix prototype of plset and generalize linspace functor.Gravatar Gael Guennebaud2015-08-07
|
* Include SSE packetmath when AVX is enabled, and enable AVX's sine function ↵Gravatar Gael Guennebaud2015-08-07
| | | | only in fast-math mode (as SSE)
* Let unpacket_traits<> exposes the required alignment and make use of it ↵Gravatar Gael Guennebaud2015-08-07
| | | | everywhere
* Fix shadow warnings triggered by clangGravatar Gael Guennebaud2015-06-09
|
* Abandon blocking size lookup table approach. Not performing as well in real ↵Gravatar Benoit Jacob2015-05-19
| | | | world as in microbenchmark.
* also uninitialized here, see previous csetGravatar Benoit Jacob2015-05-15
|
* Fix uninitialized var warning. The compiler was clearing the register ↵Gravatar Benoit Jacob2015-05-15
| | | | anyway, so this does not change resulting code
* Merged in doug_kwan/eigen (pull request PR-103)Gravatar Konstantinos Margaritis2015-05-05
|\ | | | | | | Fix bug in pdiv<Packet1cd> which swaps 32-bit halves of a pair of
* | Added a double-precision implementation of the exp() function for AVX.Gravatar Benoit Steiner2015-05-04
| |
* | Pulled latest update from the eigen main codebaseGravatar Benoit Steiner2015-03-24
|\ \
| * | Fixed the CUDA packet primitivesGravatar Benoit Steiner2015-03-24
| | |
| * | use unsigned short instead of uint16_t which doesn't exist in c++98Gravatar Benoit Jacob2015-03-17
| | |
| * | Update Nexus 5 lookup table from combining now 2 runs of the benchmark, ↵Gravatar Benoit Jacob2015-03-16
| | | | | | | | | | | | using the analyze-blocking-sizes partition tool. Gives better worst-case performance.
| * | Provide a empirical lookup table for blocking sizes measured on a Nexus 5. ↵Gravatar Benoit Jacob2015-03-15
| | | | | | | | | | | | Only for float, only for Android on ARM 32bit for now.
| | * Fix bug in pdiv<Packet1cd> which swaps 32-bit halves of a pair ofGravatar Doug Kwan2015-03-11
| |/ | | | | | | doubles instead of swapping the doubles.
* | Fixed the optimized AVX implementation of the fast rsqrt functionGravatar Benoit Steiner2015-03-02
| |
* | Added an optimized version of rsqrt for SSE and AVX that is used when ↵Gravatar Benoit Steiner2015-03-02
| | | | | | | | EIGEN_FAST_MATH is defined.
* | Pulled latest updates from trunkGravatar Benoit Steiner2015-02-27
|\ \
* | | Switch to truncated casting when converting floating point types to integer. ↵Gravatar Benoit Steiner2015-02-27
| | | | | | | | | | | | This ensures that vectorized casts are consistent with scalar casts
* | | Added support for vectorized type casting of tensorsGravatar Benoit Steiner2015-02-27
| | |
* | | Added support for fast reciprocal square root computation.Gravatar Benoit Steiner2015-02-26
| | |
| | * must also disable complex<double> when disabling double vectorizationGravatar Benoit Jacob2015-03-03
| | |
| | * Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON ↵Gravatar Benoit Jacob2015-03-03
| | | | | | | | | | | | intrinsics.
| | * HalfPacket also needed to be disabled for double, on ARMv8.Gravatar Benoit Jacob2015-03-02
| |/
| * remove trailing commaGravatar Benoit Jacob2015-02-27
| |
| * Disable Packet2f/2i halfpacket support in NEON.Gravatar Benoit Jacob2015-02-27
|/ | | | | | I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match.
* Marked the CUDA packet primitives as EIGEN_DEVICE_FUNC since they'll end up ↵Gravatar Benoit Steiner2015-02-19
| | | | being executed on the GPU device.
* bug #955 - Implement a rotating kernel alternative in the 3px4 gebp pathGravatar Benoit Jacob2015-02-18
| | | | | | | | This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).
* Add missing install directives for arch/CUDAGravatar Gael Guennebaud2015-02-18
|
* Remove some dead stores.Gravatar Gael Guennebaud2015-02-18
|
* Disable __m128* wrappers when compiling with AVX and -fabi-version=4Gravatar Gael Guennebaud2015-02-17
|
* Fix compilation with GCC/AVX (workaround __m128 and __m256 being the same ↵Gravatar Gael Guennebaud2015-02-17
| | | | type with default ABI)
* Merged in chtz/eigen-indexconversion (pull request PR-92)Gravatar Gael Guennebaud2015-02-16
|\ | | | | | | | | | | | | | | | | | | | | | | bug #877, bug #572: Get rid of Index conversion warnings, summary of changes: - Introduce a global typedef Eigen::Index making Eigen::DenseIndex and AnyExpr<>::Index deprecated (default is std::ptrdiff_t). - Eigen::Index is used throughout the API to represent indices, offsets, and sizes. - Classes storing an array of indices uses the type StorageIndex to store them. This is a template parameter of the class. Default is int. - Methods that *explicitly* set or return an element of such an array take or return a StorageIndex type. In all other cases, the Index type is used.
| * The usage of DenseIndex is deprecated, so let's replace DenseIndex by IndexGravatar Gael Guennebaud2015-02-16
| |
* | Pulled latest updates from trunkGravatar Benoit Steiner2015-02-13
|\|
* | Optimized version of the sin(), exp(), log() and sqrt() function for AVXGravatar Benoit Steiner2015-02-13
| |
| * merge Tensor module within Eigen/unsupported and update gemv BLAS wrapperGravatar Gael Guennebaud2015-02-12
|/|
* | mergeGravatar Gael Guennebaud2015-02-10
|\ \
* | | FMA has been wrongly disabledGravatar Gael Guennebaud2015-02-10
| | |
| * | Added vectorized implementation of the exponential function for ARM/NEONGravatar Benoit Steiner2015-02-10
|/ /
| * Pulled the latest changes from the trunkGravatar Benoit Steiner2015-02-06
| |\ | |/ |/|
* | bug #936, patch 3/3: Properly detect FMA support on ARM (requires VFPv4)Gravatar Benoit Jacob2015-01-30
| | | | | | | | | | and use it instead of MLA when available, because it's both more accurate, and faster.
* | bug #936, patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with ↵Gravatar Benoit Jacob2015-01-30
| | | | | | | | EIGEN_HAS_SINGLE_INSTRUCTION_MADD
* | bug #936, patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_,Gravatar Benoit Jacob2015-01-31
| | | | | | | | | | | | | | | | | | because this is what they are about. "Fused" means "no intermediate rounding between the mul and the add, only one rounding at the end". Instead, what we are concerned about here is whether a temporary register is needed, i.e. whether the MUL and ADD are separate instructions. Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA. But a true fused mul-add is only available on VFPv4: VFMA.