Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Implement a generic vectorized version of Smith's algorithms for complex ↵ | Rasmus Munk Larsen | 2021-07-01 |
| | | | | division. | ||
* | Create the ability to disable the specialized gemm_pack_rhs in Eigen (only ↵ | Chip Kerchner | 2021-06-30 |
| | | | | PPC) for TensorFlow | ||
* | Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and ↵ | Rasmus Munk Larsen | 2021-06-24 |
| | | | | CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code. | ||
* | Get rid of code duplication for conj_helper. For packets where ↵ | Rasmus Munk Larsen | 2021-06-24 |
| | | | | LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations. | ||
* | EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X ↵ | Chip-Kerchner | 2021-06-16 |
| | | | | slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate. | ||
* | Add missing ppc pcmp_lt_or_nan<Packet8bf> | Antonio Sanchez | 2021-06-15 |
| | |||
* | Use bit_cast to create -0.0 for floating point types to avoid compiler ↵ | Rasmus Munk Larsen | 2021-06-11 |
| | | | | optimization changing sign with --ffast-math enabled. | ||
* | Fix taking address of rvalue compiler issue with TensorFlow (plus other ↵ | Chip-Kerchner | 2021-04-21 |
| | | | | warnings). | ||
* | Fix address of temporary object errors in clang11. | Chip Kerchner | 2021-04-02 |
| | | | | This fixes the problem with taking the address of temporary objects which clang11 treats as errors. | ||
* | Fixed performance issues for complex VSX and P10 MMA in gebp_kernel (level 3). | Chip Kerchner | 2021-03-25 |
| | |||
* | Fix pround and add print | Chip Kerchner | 2021-03-15 |
| | |||
* | Make half/bfloat16 constructor take inputs by value, fix powerpc test. | Antonio Sanchez | 2021-02-27 |
| | | | | | | | | | | | | Since `numeric_limits<half>::max_exponent` is a static inline constant, it cannot be directly passed by reference. This triggers a linker error in recent versions of `g++-powerpc64le`. Changing `half` to take inputs by value fixes this. Wrapping `max_exponent` with `int(...)` to make an addressable integer also fixes this and may help with other custom `Scalar` types down-the-road. Also eliminated some compile warnings for powerpc. | ||
* | Fix clang compile when no MMA flags are set. Simplify MMA compiler detection. | Chip-Kerchner | 2021-02-24 |
| | |||
* | Having forward template function declarations in a P10 file causes bad code ↵ | Chip-Kerchner | 2021-02-24 |
| | | | | in certain situations. | ||
* | Fixes to support old and new versions of the compilers for built-ins. Cast ↵ | Chip-Kerchner | 2021-02-24 |
| | | | | to non-const when using vector_pair with certain built-ins. | ||
* | Fix compilation errors with later versions of GCC and use of MMA. | Chip-Kerchner | 2021-02-22 |
| | |||
* | Fixed performance issues for VSX and P10 MMA in general_matrix_matrix_product | Chip Kerchner | 2021-02-17 |
| | |||
* | Updated pfrexp implementation. | Antonio Sanchez | 2021-02-17 |
| | | | | | | The original implementation fails for 0, denormals, inf, and NaN. See #2150 | ||
* | Fix ldexp implementations. | Antonio Sanchez | 2021-02-10 |
| | | | | | | | | | | | | | | | | | The previous implementations produced garbage values if the exponent did not fit within the exponent bits. See #2131 for a complete discussion, and !375 for other possible implementations. Here we implement the 4-factor version. See `pldexp_impl` in `GenericPacketMathFunctions.h` for a full description. The SSE `pcmp*` methods were moved down since `pcmp_le<Packet4i>` requires `por`. Left as a "TODO" is to delegate to a faster version if we know the exponent does fit within the exponent bits. Fixes #2131. | ||
* | Eliminate implicit conversions from float to double. | Antonio Sanchez | 2021-02-01 |
| | |||
* | Fix altivec packetmath. | Antonio Sanchez | 2021-01-28 |
| | | | | | | | | | | | | | | | Allows the altivec packetmath tests to pass. There were a few issues: - `pstoreu` was missing MSQ on `_BIG_ENDIAN` systems - `cmp_*` didn't properly handle conversion of bool flags (0x7FC instead of 0xFFFF) - `pfrexp` needed to set the `exponent` argument. Related to !370, #2128 cc: @ChipKerchner @pdrocaldeira Tested on `_BIG_ENDIAN` running on QEMU with VSX. Couldn't figure out build flags to get it to work for little endian. | ||
* | Fix clang compilation for AltiVec from previous check-in | Chip Kerchner | 2021-01-28 |
| | |||
* | Fix sqrt, ldexp and frexp compilation errors. | Chip Kerchner | 2021-01-25 |
| | |||
* | Add support for dynamic dispatch of MMA instructions for POWER 10 | Pedro Caldeira | 2020-11-12 |
| | |||
* | Add missing functions for Packet8bf in Altivec architecture. | Pedro Caldeira | 2020-09-08 |
| | | | | | Including new tests for bfloat16 Packets. Fix prsqrt on GenericPacketMath. | ||
* | MatrixProuct enhancements: | Everton Constantino | 2020-09-02 |
| | | | | | | | | | | | | | - Changes to Altivec/MatrixProduct Adapting code to gcc 10. Generic code style and performance enhancements. Adding PanelMode support. Adding stride/offset support. Enabling float64, std::complex and std::complex. Fixing lack of symm_pack. Enabling mixedtypes. - Adding std::complex tests to blasutil. - Adding an implementation of storePacketBlock when Incr!= 1. | ||
* | Changing u/int8_t to un/signed char because clang does not understand | Everton Constantino | 2020-09-02 |
| | | | | | | it. Implementing pcmp_eq to Packet8 and Packet16. | ||
* | Change Packet8s and Packet8us to use vector commands on Power for pmadd, ↵ | Chip Kerchner | 2020-08-28 |
| | | | | pmul and psub. | ||
* | Add support for Bfloat16 to use vector instructions on Altivec | Pedro Caldeira | 2020-08-10 |
| | | | | architecture | ||
* | Fix pscatter and pgather for Altivec Complex double | Pedro Caldeira | 2020-06-16 |
| | |||
* | Add pscatter for Packet16{u}c (int8) | Pedro Caldeira | 2020-05-20 |
| | |||
* | - Vectorizing MMA packing. | Everton Constantino | 2020-05-19 |
| | | | | | - Optimizing MMA kernel. - Adding PacketBlock store to blas_data_mapper. | ||
* | Altivec template functions to better code reusability | Pedro Caldeira | 2020-05-11 |
| | |||
* | Remove unused packet op "palign". | Rasmus Munk Larsen | 2020-05-07 |
| | | | | Clean up a compiler warning in c++03 mode in AVX512/Complex.h. | ||
* | Add support to vector instructions to Packet16uc and Packet16c | Pedro Caldeira | 2020-04-27 |
| | |||
* | Remove unused packet op "preduxp". | Rasmus Munk Larsen | 2020-04-23 |
| | |||
* | Add Packet8s and Packet8us to support signed/unsigned int16/short Altivec ↵ | Pedro Caldeira | 2020-04-21 |
| | | | | vector operations | ||
* | Adhere to recommended load/store intrinsics for pp64le | Everton Constantino | 2020-03-23 |
| | |||
* | Fixing float32's pround halfway criteria to match STL's criteria. | Everton Constantino | 2020-03-21 |
| | |||
* | Add shift_left<N> and shift_right<N> coefficient-wise unary Array functions | Joel Holdsworth | 2020-03-19 |
| | |||
* | Switching unpacket_traits<Packet4i> to vectorizable=true. | Everton Constantino | 2020-01-13 |
| | |||
* | Move implementation of vectorized error function erf() to ↵ | Rasmus Munk Larsen | 2019-09-27 |
| | | | | SpecialFunctionsImpl.h. | ||
* | Add generic PacketMath implementation of the Error Function (erf). | Rasmus Munk Larsen | 2019-09-19 |
| | |||
* | Fix compilation without vector engine available (e.g., x86 with SSE disabled): | Gael Guennebaud | 2019-09-05 |
| | | | | -> ppolevl is required by ndtri even for the scalar path | ||
* | Fix debug macros in p{load,store}u | João P. L. de Carvalho | 2019-08-14 |
| | |||
* | Add missing pcmp_XX methods for double/Packet2d | João P. L. de Carvalho | 2019-08-14 |
| | | | | This actually fixes an issue in unit-test packetmath_2 with pcmp_eq when it is compiled with clang. When pcmp_eq(Packet4f,Packet4f) is used instead of pcmp_eq(Packet2d,Packet2d), the unit-test does not pass due to NaN on ref vector. | ||
* | Fix packed load/store for PowerPC's VSX | João P. L. de Carvalho | 2019-08-09 |
| | | | | | | | | The vec_vsx_ld/vec_vsx_st builtins were wrongly used for aligned load/store. In fact, they perform unaligned memory access and, even when the address is 16-byte aligned, they are much slower (at least 2x) than their aligned counterparts. For double/Packet2d vec_xl/vec_xst should be prefered over vec_ld/vec_st, although the latter works when casted to float/Packet4f. Silencing some weird warning with throw but some GCC versions. Such warning are not thrown by Clang. | ||
* | Fix offset argument of ploadu/pstoreu for Altivec | João P. L. de Carvalho | 2019-08-09 |
| | | | | | | | | | | If no offset is given, them it should be zero. Also passes full address to vec_vsx_ld/st builtins. Removes userless _EIGEN_ALIGNED_PTR & _EIGEN_MASK_ALIGNMENT. Removes unnecessary casts. | ||
* | bug #1718: Add cast to successfully compile with clang on PowerPC | João P. L. de Carvalho | 2019-08-09 |
| | | | | Ignoring -Wc11-extensions warnings thrown by clang at Altivec/PacketMath.h | ||
* | Add masked_store_available to unpacket_traits | Eugene Zhulenev | 2019-05-02 |
| |