Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | bug #1085: workaround gcc default ABI issue | Gael Guennebaud | 2015-10-10 |
| | |||
* | _mm_hadd_epi32 is for SSSE3 only (and not SSE3) | Gael Guennebaud | 2015-10-07 |
| | |||
* | Handle various TODOs in SSE vectorization (remove splitted storeu, enable ↵ | Gael Guennebaud | 2015-10-06 |
| | | | | SSE3 integer vectorization, plus minor tweaks) | ||
* | bug #1069: fix AVX support on MSVC (use of non portable C-style cast) | Gael Guennebaud | 2015-09-28 |
| | |||
* | Added support for predux_mul for CUDA devices | Benoit Steiner | 2015-09-08 |
| | |||
* | Implement plog and pexp for AltiVec. | Doug Kwan | 2015-07-30 |
| | |||
* | Fix prototype of plset and generalize linspace functor. | Gael Guennebaud | 2015-08-07 |
| | |||
* | Include SSE packetmath when AVX is enabled, and enable AVX's sine function ↵ | Gael Guennebaud | 2015-08-07 |
| | | | | only in fast-math mode (as SSE) | ||
* | Let unpacket_traits<> exposes the required alignment and make use of it ↵ | Gael Guennebaud | 2015-08-07 |
| | | | | everywhere | ||
* | Fix shadow warnings triggered by clang | Gael Guennebaud | 2015-06-09 |
| | |||
* | Abandon blocking size lookup table approach. Not performing as well in real ↵ | Benoit Jacob | 2015-05-19 |
| | | | | world as in microbenchmark. | ||
* | also uninitialized here, see previous cset | Benoit Jacob | 2015-05-15 |
| | |||
* | Fix uninitialized var warning. The compiler was clearing the register ↵ | Benoit Jacob | 2015-05-15 |
| | | | | anyway, so this does not change resulting code | ||
* | Merged in doug_kwan/eigen (pull request PR-103) | Konstantinos Margaritis | 2015-05-05 |
|\ | | | | | | | Fix bug in pdiv<Packet1cd> which swaps 32-bit halves of a pair of | ||
* | | Added a double-precision implementation of the exp() function for AVX. | Benoit Steiner | 2015-05-04 |
| | | |||
* | | Pulled latest update from the eigen main codebase | Benoit Steiner | 2015-03-24 |
|\ \ | |||
| * | | Fixed the CUDA packet primitives | Benoit Steiner | 2015-03-24 |
| | | | |||
| * | | use unsigned short instead of uint16_t which doesn't exist in c++98 | Benoit Jacob | 2015-03-17 |
| | | | |||
| * | | Update Nexus 5 lookup table from combining now 2 runs of the benchmark, ↵ | Benoit Jacob | 2015-03-16 |
| | | | | | | | | | | | | using the analyze-blocking-sizes partition tool. Gives better worst-case performance. | ||
| * | | Provide a empirical lookup table for blocking sizes measured on a Nexus 5. ↵ | Benoit Jacob | 2015-03-15 |
| | | | | | | | | | | | | Only for float, only for Android on ARM 32bit for now. | ||
| | * | Fix bug in pdiv<Packet1cd> which swaps 32-bit halves of a pair of | Doug Kwan | 2015-03-11 |
| |/ | | | | | | | doubles instead of swapping the doubles. | ||
* | | Fixed the optimized AVX implementation of the fast rsqrt function | Benoit Steiner | 2015-03-02 |
| | | |||
* | | Added an optimized version of rsqrt for SSE and AVX that is used when ↵ | Benoit Steiner | 2015-03-02 |
| | | | | | | | | EIGEN_FAST_MATH is defined. | ||
* | | Pulled latest updates from trunk | Benoit Steiner | 2015-02-27 |
|\ \ | |||
* | | | Switch to truncated casting when converting floating point types to integer. ↵ | Benoit Steiner | 2015-02-27 |
| | | | | | | | | | | | | This ensures that vectorized casts are consistent with scalar casts | ||
* | | | Added support for vectorized type casting of tensors | Benoit Steiner | 2015-02-27 |
| | | | |||
* | | | Added support for fast reciprocal square root computation. | Benoit Steiner | 2015-02-26 |
| | | | |||
| | * | must also disable complex<double> when disabling double vectorization | Benoit Jacob | 2015-03-03 |
| | | | |||
| | * | Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON ↵ | Benoit Jacob | 2015-03-03 |
| | | | | | | | | | | | | intrinsics. | ||
| | * | HalfPacket also needed to be disabled for double, on ARMv8. | Benoit Jacob | 2015-03-02 |
| |/ | |||
| * | remove trailing comma | Benoit Jacob | 2015-02-27 |
| | | |||
| * | Disable Packet2f/2i halfpacket support in NEON. | Benoit Jacob | 2015-02-27 |
|/ | | | | | | I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match. | ||
* | Marked the CUDA packet primitives as EIGEN_DEVICE_FUNC since they'll end up ↵ | Benoit Steiner | 2015-02-19 |
| | | | | being executed on the GPU device. | ||
* | bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path | Benoit Jacob | 2015-02-18 |
| | | | | | | | | This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower). | ||
* | Add missing install directives for arch/CUDA | Gael Guennebaud | 2015-02-18 |
| | |||
* | Remove some dead stores. | Gael Guennebaud | 2015-02-18 |
| | |||
* | Disable __m128* wrappers when compiling with AVX and -fabi-version=4 | Gael Guennebaud | 2015-02-17 |
| | |||
* | Fix compilation with GCC/AVX (workaround __m128 and __m256 being the same ↵ | Gael Guennebaud | 2015-02-17 |
| | | | | type with default ABI) | ||
* | Merged in chtz/eigen-indexconversion (pull request PR-92) | Gael Guennebaud | 2015-02-16 |
|\ | | | | | | | | | | | | | | | | | | | | | | | bug #877, bug #572: Get rid of Index conversion warnings, summary of changes: - Introduce a global typedef Eigen::Index making Eigen::DenseIndex and AnyExpr<>::Index deprecated (default is std::ptrdiff_t). - Eigen::Index is used throughout the API to represent indices, offsets, and sizes. - Classes storing an array of indices uses the type StorageIndex to store them. This is a template parameter of the class. Default is int. - Methods that *explicitly* set or return an element of such an array take or return a StorageIndex type. In all other cases, the Index type is used. | ||
| * | The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index | Gael Guennebaud | 2015-02-16 |
| | | |||
* | | Pulled latest updates from trunk | Benoit Steiner | 2015-02-13 |
|\| | |||
* | | Optimized version of the sin(), exp(), log() and sqrt() function for AVX | Benoit Steiner | 2015-02-13 |
| | | |||
| * | merge Tensor module within Eigen/unsupported and update gemv BLAS wrapper | Gael Guennebaud | 2015-02-12 |
|/| | |||
* | | merge | Gael Guennebaud | 2015-02-10 |
|\ \ | |||
* | | | FMA has been wrongly disabled | Gael Guennebaud | 2015-02-10 |
| | | | |||
| * | | Added vectorized implementation of the exponential function for ARM/NEON | Benoit Steiner | 2015-02-10 |
|/ / | |||
| * | Pulled the latest changes from the trunk | Benoit Steiner | 2015-02-06 |
| |\ | |/ |/| | |||
* | | bug #936, patch 3/3: Properly detect FMA support on ARM (requires VFPv4) | Benoit Jacob | 2015-01-30 |
| | | | | | | | | | | and use it instead of MLA when available, because it's both more accurate, and faster. | ||
* | | bug #936, patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with ↵ | Benoit Jacob | 2015-01-30 |
| | | | | | | | | EIGEN_HAS_SINGLE_INSTRUCTION_MADD | ||
* | | bug #936, patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_, | Benoit Jacob | 2015-01-31 |
| | | | | | | | | | | | | | | | | | | because this is what they are about. "Fused" means "no intermediate rounding between the mul and the add, only one rounding at the end". Instead, what we are concerned about here is whether a temporary register is needed, i.e. whether the MUL and ADD are separate instructions. Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA. But a true fused mul-add is only available on VFPv4: VFMA. |