| Commit message (Collapse) | Author | Age |
|
|
|
|
|
| |
casting, which broke
build with -march=native on Haswell/Skylake.
|
|
|
|
| |
arguments to log1p such that log1p(inf) = inf.
|
|
|
|
| |
than -1. Fix packet op accordingly.
|
|
|
|
| |
half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories.
|
|\
| |
| |
| | |
Fixes for Altivec/VSX and compilation with clang on PowerPC
|
| | |
|
| |
| |
| |
| | |
This actually fixes an issue in unit-test packetmath_2 with pcmp_eq when it is compiled with clang. When pcmp_eq(Packet4f,Packet4f) is used instead of pcmp_eq(Packet2d,Packet2d), the unit-test does not pass due to NaN on ref vector.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
formulas, and change the scalar implementations to properly handle infinite arguments.
Depending on instruction set, significant speedups are observed for the vectorized path:
log1p wall time is reduced 60-93% (2.5x - 15x speedup)
expm1 wall time is reduced 0-85% (1x - 7x speedup)
The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly.
Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM
|
| |
| |
| |
| |
| |
| |
| |
| | |
The vec_vsx_ld/vec_vsx_st builtins were wrongly used for aligned load/store. In fact, they perform unaligned memory access and, even when the address is 16-byte aligned, they are much slower (at least 2x) than their aligned counterparts.
For double/Packet2d vec_xl/vec_xst should be prefered over vec_ld/vec_st, although the latter works when casted to float/Packet4f.
Silencing some weird warning with throw but some GCC versions. Such warning are not thrown by Clang.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If no offset is given, them it should be zero.
Also passes full address to vec_vsx_ld/st builtins.
Removes userless _EIGEN_ALIGNED_PTR & _EIGEN_MASK_ALIGNMENT.
Removes unnecessary casts.
|
|/
|
|
| |
Ignoring -Wc11-extensions warnings thrown by clang at Altivec/PacketMath.h
|
|
|
|
|
|
| |
each other.
Add specializations for complex types since std::log1p and std::exp1m do not support complex.
|
| |
|
|
|
|
| |
to make it actually appear in the generated documentation.
|
|
|
|
| |
Also, document LinSpaced only where it is implemented
|
| |
|
| |
|
|
|
|
|
| |
* an interface for SYCL buffers to behave as a non-dereferenceable pointer
* an interface for placeholder accessor to behave like a pointer on both host and device
|
| |
|
|
|
|
|
|
|
|
| |
Eigen unsupported modules on devices supporting SYCL.
* Adding SYCL memory model
* Enabling/Disabling SYCL backend in Core
* Supporting Vectorization
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
clause.
|
| |
|
|
|
|
|
|
| |
1. Fix buggy pcmp_eq and unit test for half types.
2. Add unit test for pselect and add specializations for SSE 4.1, AVX512, and half types.
3. Get rid of FIXME: Implement faster pnegate for half by XOR'ing with a sign bit mask.
|
|
|
|
|
| |
(grafted from 427f2f66d69ae9b124c2f8bcd927fb6e19e07e91
)
|
|
|
|
| |
EIGEN_HAS_TYPE_TRAITS is off.
|
|
|
|
| |
clang.
|
|
|
|
| |
CUDA build failures.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
That was hurting users with compilers that would object to proceed with
that:
"""
./Eigen/src/Core/products/GeneralMatrixVector.h:356:10: error: declaration shadows a static data member of 'general_matrix_vector_product<type-parameter-0-0, type-parameter-0-1, type-parameter-0-2, 1, ConjugateLhs, type-parameter-0-4, type-parameter-0-5, ConjugateRhs, Version>' [-Werror,-Wshadow]
LhsPacketSize = Traits::LhsPacketSize,
^
./Eigen/src/Core/products/GeneralMatrixVector.h:307:22: note: previous declaration is here
static const Index LhsPacketSize = Traits::LhsPacketSize;
"""
|
|
|
|
|
|
| |
https://reviews.llvm.org/D16177
and are part of LLVM 3.8.0.
|
|\
| |
| |
| |
| |
| | |
Make Eigen build with cuda 10 and clang.
Approved-by: Justin Lebar <justin.lebar@gmail.com>
|
|\ \
| | |
| | |
| | | |
Eigen: Fix MSVC C++17 language standard detection logic
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | | |
Speed up GEMV on AVX-512 builds, just as done for GEBP previously.
Approved-by: Rasmus Larsen <rmlarsen@google.com>
|
| |_|/
|/| | |
|
| | |
| | |
| | |
| | | |
AVX512VL, AVX512BW usage
|
| | |
| | |
| | |
| | | |
deprecated functions
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| |/
|/|
| |
| |
| |
| |
| | |
To detect C++17 support, use _MSVC_LANG macro instead of _MSC_VER. _MSC_VER can indicate whether the current compiler version could support the C++17 language standard, but not whether that standard is actually selected (i.e. via /std:c++17).
See these web pages for more details:
https://devblogs.microsoft.com/cppblog/msvc-now-correctly-reports-__cplusplus/
https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| | |
We take advantage of smaller SIMD registers as well, in that case.
Gains up to 3x for select input sizes.
|