| Commit message (Collapse) | Author | Age |
|\
| |
| |
| | |
Add generic PacketMath implementation of the Error Function (erf).
|
| | |
|
| |
| |
| |
| | |
copy-assign-operator from PermutationMatrix and Transpositions to allow malloc-less std::move. Added unit-test to rvalue_types
|
| | |
|
|/ |
|
|
|
|
|
|
|
| |
The errors were introduced by this commit : https://bitbucket.org/eigen/eigen/commits/6e215cf109073da9ffb5b491171613b8db24fd9d
The fix is switching to using ::<math_func> instead std::<math_func> when compiling for GPU
|
|
|
|
|
|
|
|
|
| |
- Split SpecialFunctions files in to a separate BesselFunctions file.
In particular add:
- Modified bessel functions of the second kind k0, k1, k0e, k1e
- Bessel functions of the first kind j0, j1
- Bessel functions of the second kind y0, y1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- In particular refactor the i0e and i1e code so scalar and vectorized path share code.
- Move chebevl to GenericPacketMathFunctions.
A brief benchmark with building Eigen with FMA, AVX and AVX2 flags
Before:
CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1 57.3 57.3 10000000
BM_eigen_i0e_double/8 398 398 1748554
BM_eigen_i0e_double/64 3184 3184 218961
BM_eigen_i0e_double/512 25579 25579 27330
BM_eigen_i0e_double/4k 205043 205042 3418
BM_eigen_i0e_double/32k 1646038 1646176 422
BM_eigen_i0e_double/256k 13180959 13182613 53
BM_eigen_i0e_double/1M 52684617 52706132 10
BM_eigen_i0e_float/1 28.4 28.4 24636711
BM_eigen_i0e_float/8 75.7 75.7 9207634
BM_eigen_i0e_float/64 512 512 1000000
BM_eigen_i0e_float/512 4194 4194 166359
BM_eigen_i0e_float/4k 32756 32761 21373
BM_eigen_i0e_float/32k 261133 261153 2678
BM_eigen_i0e_float/256k 2087938 2088231 333
BM_eigen_i0e_float/1M 8380409 8381234 84
BM_eigen_i1e_double/1 56.3 56.3 10000000
BM_eigen_i1e_double/8 397 397 1772376
BM_eigen_i1e_double/64 3114 3115 223881
BM_eigen_i1e_double/512 25358 25361 27761
BM_eigen_i1e_double/4k 203543 203593 3462
BM_eigen_i1e_double/32k 1613649 1613803 428
BM_eigen_i1e_double/256k 12910625 12910374 54
BM_eigen_i1e_double/1M 51723824 51723991 10
BM_eigen_i1e_float/1 28.3 28.3 24683049
BM_eigen_i1e_float/8 74.8 74.9 9366216
BM_eigen_i1e_float/64 505 505 1000000
BM_eigen_i1e_float/512 4068 4068 171690
BM_eigen_i1e_float/4k 31803 31806 21948
BM_eigen_i1e_float/32k 253637 253692 2763
BM_eigen_i1e_float/256k 2019711 2019918 346
BM_eigen_i1e_float/1M 8238681 8238713 86
After:
CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1 15.8 15.8 44097476
BM_eigen_i0e_double/8 99.3 99.3 7014884
BM_eigen_i0e_double/64 777 777 886612
BM_eigen_i0e_double/512 6180 6181 100000
BM_eigen_i0e_double/4k 48136 48140 14678
BM_eigen_i0e_double/32k 385936 385943 1801
BM_eigen_i0e_double/256k 3293324 3293551 228
BM_eigen_i0e_double/1M 12423600 12424458 57
BM_eigen_i0e_float/1 16.3 16.3 43038042
BM_eigen_i0e_float/8 30.1 30.1 23456931
BM_eigen_i0e_float/64 169 169 4132875
BM_eigen_i0e_float/512 1338 1339 516860
BM_eigen_i0e_float/4k 10191 10191 68513
BM_eigen_i0e_float/32k 81338 81337 8531
BM_eigen_i0e_float/256k 651807 651984 1000
BM_eigen_i0e_float/1M 2633821 2634187 268
BM_eigen_i1e_double/1 16.2 16.2 42352499
BM_eigen_i1e_double/8 110 110 6316524
BM_eigen_i1e_double/64 822 822 851065
BM_eigen_i1e_double/512 6480 6481 100000
BM_eigen_i1e_double/4k 51843 51843 10000
BM_eigen_i1e_double/32k 414854 414852 1680
BM_eigen_i1e_double/256k 3320001 3320568 212
BM_eigen_i1e_double/1M 13442795 13442391 53
BM_eigen_i1e_float/1 17.6 17.6 41025735
BM_eigen_i1e_float/8 35.5 35.5 19597891
BM_eigen_i1e_float/64 240 240 2924237
BM_eigen_i1e_float/512 1424 1424 485953
BM_eigen_i1e_float/4k 10722 10723 65162
BM_eigen_i1e_float/32k 86286 86297 8048
BM_eigen_i1e_float/256k 691821 691868 1000
BM_eigen_i1e_float/1M 2777336 2777747 256
This shows anywhere from a 50% to 75% improvement on these operations.
I've also benchmarked without any of these flags turned on, and got similar
performance to before (if not better).
Also tested packetmath.cpp + special_functions to ensure no regressions.
|
|\ |
|
| |
| |
| |
| |
| |
| |
| | |
The fixes needed are
* adding EIGEN_DEVICE_FUNC attribute to a couple of funcs (else HIPCC will error out when non-device funcs are called from global/device funcs)
* switching to using ::<math_func> instead std::<math_func> (only for HIPCC) in cases where the std::<math_func> is not recognized as a device func by HIPCC
* removing an errant "j" from a testcase (don't know how that made it in to begin with!)
|
| |
| |
| |
| | |
true compile-time "if" for block_evaluator<>::coeff(i)/coeffRef(i)
|
| |
| |
| |
| | |
triangular^1*matrix with a destination having a non-trivial inner-stride
|
| | |
|
| | |
|
| |
| |
| |
| | |
for destination with non-trivial inner stride
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
GenericPacketMathFunctions.
Another solution would have been to make pshift* fully generic template functions with
partial specialization which is always a mess in c++03.
|
| |
| |
| |
| | |
-> ppolevl is required by ndtri even for the scalar path
|
|\ \ |
|
|\ \ \ |
|
| | | | |
|
| | |/ |
|
| | | |
|
| | | |
|
| | | |
|
| |/ |
|
| | |
|
| |
| |
| |
| |
| |
| | |
- Move colamd implementation in its own namespace to avoid polluting the internal namespace with Ok, Status, etc.
- Fix signed/unsigned warning
- move some ugly free functions as member functions
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| | |
const definitions
|
| |
| |
| |
| |
| |
| | |
COLAMD_DEAD
to prevent conflicts with other libraries / code.
|
| |
| |
| |
| |
| |
| | |
casting, which broke
build with -march=native on Haswell/Skylake.
|
| |
| |
| |
| | |
arguments to log1p such that log1p(inf) = inf.
|
| |
| |
| |
| | |
than -1. Fix packet op accordingly.
|
| |
| |
| |
| | |
half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories.
|
| |\
| | |
| | |
| | | |
Fixes for Altivec/VSX and compilation with clang on PowerPC
|
| | | |
|
| | |
| | |
| | |
| | | |
This actually fixes an issue in unit-test packetmath_2 with pcmp_eq when it is compiled with clang. When pcmp_eq(Packet4f,Packet4f) is used instead of pcmp_eq(Packet2d,Packet2d), the unit-test does not pass due to NaN on ref vector.
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
formulas, and change the scalar implementations to properly handle infinite arguments.
Depending on instruction set, significant speedups are observed for the vectorized path:
log1p wall time is reduced 60-93% (2.5x - 15x speedup)
expm1 wall time is reduced 0-85% (1x - 7x speedup)
The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly.
Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM
|
| |
| |
| |
| |
| |
| |
| |
| | |
The vec_vsx_ld/vec_vsx_st builtins were wrongly used for aligned load/store. In fact, they perform unaligned memory access and, even when the address is 16-byte aligned, they are much slower (at least 2x) than their aligned counterparts.
For double/Packet2d vec_xl/vec_xst should be prefered over vec_ld/vec_st, although the latter works when casted to float/Packet4f.
Silencing some weird warning with throw but some GCC versions. Such warning are not thrown by Clang.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If no offset is given, them it should be zero.
Also passes full address to vec_vsx_ld/st builtins.
Removes userless _EIGEN_ALIGNED_PTR & _EIGEN_MASK_ALIGNMENT.
Removes unnecessary casts.
|
|/
|
|
| |
Ignoring -Wc11-extensions warnings thrown by clang at Altivec/PacketMath.h
|
|
|
|
|
|
| |
each other.
Add specializations for complex types since std::log1p and std::exp1m do not support complex.
|
| |
|
| |
|
|
|
|
| |
to make it actually appear in the generated documentation.
|
|
|
|
| |
Also, document LinSpaced only where it is implemented
|
| |
|