| Commit message (Collapse) | Author | Age |
|
|
|
| |
division.
|
|
|
|
| |
PPC) for TensorFlow
|
|
|
|
| |
CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code.
|
|
|
|
| |
LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations.
|
|
|
|
| |
slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate.
|
| |
|
|
|
|
| |
optimization changing sign with --ffast-math enabled.
|
|
|
|
| |
warnings).
|
|
|
|
| |
This fixes the problem with taking the address of temporary objects which clang11 treats as errors.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since `numeric_limits<half>::max_exponent` is a static inline constant,
it cannot be directly passed by reference. This triggers a linker error
in recent versions of `g++-powerpc64le`.
Changing `half` to take inputs by value fixes this. Wrapping
`max_exponent` with `int(...)` to make an addressable integer also fixes this
and may help with other custom `Scalar` types down-the-road.
Also eliminated some compile warnings for powerpc.
|
| |
|
|
|
|
| |
in certain situations.
|
|
|
|
| |
to non-const when using vector_pair with certain built-ins.
|
| |
|
| |
|
|
|
|
|
|
| |
The original implementation fails for 0, denormals, inf, and NaN.
See #2150
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous implementations produced garbage values if the exponent did
not fit within the exponent bits. See #2131 for a complete discussion,
and !375 for other possible implementations.
Here we implement the 4-factor version. See `pldexp_impl` in
`GenericPacketMathFunctions.h` for a full description.
The SSE `pcmp*` methods were moved down since `pcmp_le<Packet4i>`
requires `por`.
Left as a "TODO" is to delegate to a faster version if we know the
exponent does fit within the exponent bits.
Fixes #2131.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allows the altivec packetmath tests to pass. There were a few issues:
- `pstoreu` was missing MSQ on `_BIG_ENDIAN` systems
- `cmp_*` didn't properly handle conversion of bool flags (0x7FC instead
of 0xFFFF)
- `pfrexp` needed to set the `exponent` argument.
Related to !370, #2128
cc: @ChipKerchner @pdrocaldeira
Tested on `_BIG_ENDIAN` running on QEMU with VSX. Couldn't figure out build
flags to get it to work for little endian.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Including new tests for bfloat16 Packets.
Fix prsqrt on GenericPacketMath.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Changes to Altivec/MatrixProduct
Adapting code to gcc 10.
Generic code style and performance enhancements.
Adding PanelMode support.
Adding stride/offset support.
Enabling float64, std::complex and std::complex.
Fixing lack of symm_pack.
Enabling mixedtypes.
- Adding std::complex tests to blasutil.
- Adding an implementation of storePacketBlock when Incr!= 1.
|
|
|
|
|
|
| |
it.
Implementing pcmp_eq to Packet8 and Packet16.
|
|
|
|
| |
pmul and psub.
|
|
|
|
| |
architecture
|
| |
|
| |
|
|
|
|
|
| |
- Optimizing MMA kernel.
- Adding PacketBlock store to blas_data_mapper.
|
| |
|
|
|
|
| |
Clean up a compiler warning in c++03 mode in AVX512/Complex.h.
|
| |
|
| |
|
|
|
|
| |
vector operations
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
SpecialFunctionsImpl.h.
|
| |
|
|
|
|
| |
-> ppolevl is required by ndtri even for the scalar path
|
| |
|
|
|
|
| |
This actually fixes an issue in unit-test packetmath_2 with pcmp_eq when it is compiled with clang. When pcmp_eq(Packet4f,Packet4f) is used instead of pcmp_eq(Packet2d,Packet2d), the unit-test does not pass due to NaN on ref vector.
|
|
|
|
|
|
|
|
| |
The vec_vsx_ld/vec_vsx_st builtins were wrongly used for aligned load/store. In fact, they perform unaligned memory access and, even when the address is 16-byte aligned, they are much slower (at least 2x) than their aligned counterparts.
For double/Packet2d vec_xl/vec_xst should be prefered over vec_ld/vec_st, although the latter works when casted to float/Packet4f.
Silencing some weird warning with throw but some GCC versions. Such warning are not thrown by Clang.
|
|
|
|
|
|
|
|
|
|
| |
If no offset is given, them it should be zero.
Also passes full address to vec_vsx_ld/st builtins.
Removes userless _EIGEN_ALIGNED_PTR & _EIGEN_MASK_ALIGNMENT.
Removes unnecessary casts.
|
|
|
|
| |
Ignoring -Wc11-extensions warnings thrown by clang at Altivec/PacketMath.h
|
| |
|