| Commit message (Collapse) | Author | Age |
| |
|
| |
|
|
|
|
| |
for destination with non-trivial inner stride
|
| |
|
|
|
|
|
|
|
| |
GenericPacketMathFunctions.
Another solution would have been to make pshift* fully generic template functions with
partial specialization which is always a mess in c++03.
|
|
|
|
| |
-> ppolevl is required by ndtri even for the scalar path
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
- Move colamd implementation in its own namespace to avoid polluting the internal namespace with Ok, Status, etc.
- Fix signed/unsigned warning
- move some ugly free functions as member functions
|
| |
|
| |
|
| |
|
|
|
|
| |
const definitions
|
|
|
|
|
|
| |
COLAMD_DEAD
to prevent conflicts with other libraries / code.
|
|
|
|
|
|
| |
casting, which broke
build with -march=native on Haswell/Skylake.
|
|
|
|
| |
arguments to log1p such that log1p(inf) = inf.
|
|
|
|
| |
than -1. Fix packet op accordingly.
|
|
|
|
| |
half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories.
|
|\
| |
| |
| | |
Fixes for Altivec/VSX and compilation with clang on PowerPC
|
| | |
|
| |
| |
| |
| | |
This actually fixes an issue in unit-test packetmath_2 with pcmp_eq when it is compiled with clang. When pcmp_eq(Packet4f,Packet4f) is used instead of pcmp_eq(Packet2d,Packet2d), the unit-test does not pass due to NaN on ref vector.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
formulas, and change the scalar implementations to properly handle infinite arguments.
Depending on instruction set, significant speedups are observed for the vectorized path:
log1p wall time is reduced 60-93% (2.5x - 15x speedup)
expm1 wall time is reduced 0-85% (1x - 7x speedup)
The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly.
Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM
|
| |
| |
| |
| |
| |
| |
| |
| | |
The vec_vsx_ld/vec_vsx_st builtins were wrongly used for aligned load/store. In fact, they perform unaligned memory access and, even when the address is 16-byte aligned, they are much slower (at least 2x) than their aligned counterparts.
For double/Packet2d vec_xl/vec_xst should be prefered over vec_ld/vec_st, although the latter works when casted to float/Packet4f.
Silencing some weird warning with throw but some GCC versions. Such warning are not thrown by Clang.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If no offset is given, them it should be zero.
Also passes full address to vec_vsx_ld/st builtins.
Removes userless _EIGEN_ALIGNED_PTR & _EIGEN_MASK_ALIGNMENT.
Removes unnecessary casts.
|
|/
|
|
| |
Ignoring -Wc11-extensions warnings thrown by clang at Altivec/PacketMath.h
|
|
|
|
|
|
| |
each other.
Add specializations for complex types since std::log1p and std::exp1m do not support complex.
|
| |
|
| |
|
|
|
|
| |
to make it actually appear in the generated documentation.
|
|
|
|
| |
Also, document LinSpaced only where it is implemented
|
| |
|
| |
|
|
|
|
|
| |
* an interface for SYCL buffers to behave as a non-dereferenceable pointer
* an interface for placeholder accessor to behave like a pointer on both host and device
|
| |
|
|
|
|
|
|
|
|
| |
Eigen unsupported modules on devices supporting SYCL.
* Adding SYCL memory model
* Enabling/Disabling SYCL backend in Core
* Supporting Vectorization
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
clause.
|
| |
|
|
|
|
|
|
| |
1. Fix buggy pcmp_eq and unit test for half types.
2. Add unit test for pselect and add specializations for SSE 4.1, AVX512, and half types.
3. Get rid of FIXME: Implement faster pnegate for half by XOR'ing with a sign bit mask.
|
|
|
|
|
| |
(grafted from 427f2f66d69ae9b124c2f8bcd927fb6e19e07e91
)
|
|
|
|
| |
EIGEN_HAS_TYPE_TRAITS is off.
|
|
|
|
| |
clang.
|
|
|
|
| |
CUDA build failures.
|