aboutsummaryrefslogtreecommitdiffhomepage
Commit message (Collapse)AuthorAge
* Add Bessel functions to SpecialFunctions.Gravatar Srinivas Vasudevan2019-09-14
| | | | | | | | | - Split SpecialFunctions files in to a separate BesselFunctions file. In particular add: - Modified bessel functions of the second kind k0, k1, k0e, k1e - Bessel functions of the first kind j0, j1 - Bessel functions of the second kind y0, y1
* Add packetized versions of i0e and i1e special functions.Gravatar Srinivas Vasudevan2019-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - In particular refactor the i0e and i1e code so scalar and vectorized path share code. - Move chebevl to GenericPacketMathFunctions. A brief benchmark with building Eigen with FMA, AVX and AVX2 flags Before: CPU: Intel Haswell with HyperThreading (6 cores) Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- BM_eigen_i0e_double/1 57.3 57.3 10000000 BM_eigen_i0e_double/8 398 398 1748554 BM_eigen_i0e_double/64 3184 3184 218961 BM_eigen_i0e_double/512 25579 25579 27330 BM_eigen_i0e_double/4k 205043 205042 3418 BM_eigen_i0e_double/32k 1646038 1646176 422 BM_eigen_i0e_double/256k 13180959 13182613 53 BM_eigen_i0e_double/1M 52684617 52706132 10 BM_eigen_i0e_float/1 28.4 28.4 24636711 BM_eigen_i0e_float/8 75.7 75.7 9207634 BM_eigen_i0e_float/64 512 512 1000000 BM_eigen_i0e_float/512 4194 4194 166359 BM_eigen_i0e_float/4k 32756 32761 21373 BM_eigen_i0e_float/32k 261133 261153 2678 BM_eigen_i0e_float/256k 2087938 2088231 333 BM_eigen_i0e_float/1M 8380409 8381234 84 BM_eigen_i1e_double/1 56.3 56.3 10000000 BM_eigen_i1e_double/8 397 397 1772376 BM_eigen_i1e_double/64 3114 3115 223881 BM_eigen_i1e_double/512 25358 25361 27761 BM_eigen_i1e_double/4k 203543 203593 3462 BM_eigen_i1e_double/32k 1613649 1613803 428 BM_eigen_i1e_double/256k 12910625 12910374 54 BM_eigen_i1e_double/1M 51723824 51723991 10 BM_eigen_i1e_float/1 28.3 28.3 24683049 BM_eigen_i1e_float/8 74.8 74.9 9366216 BM_eigen_i1e_float/64 505 505 1000000 BM_eigen_i1e_float/512 4068 4068 171690 BM_eigen_i1e_float/4k 31803 31806 21948 BM_eigen_i1e_float/32k 253637 253692 2763 BM_eigen_i1e_float/256k 2019711 2019918 346 BM_eigen_i1e_float/1M 8238681 8238713 86 After: CPU: Intel Haswell with HyperThreading (6 cores) Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- BM_eigen_i0e_double/1 15.8 15.8 44097476 BM_eigen_i0e_double/8 99.3 99.3 7014884 BM_eigen_i0e_double/64 777 777 886612 BM_eigen_i0e_double/512 6180 6181 100000 BM_eigen_i0e_double/4k 48136 48140 14678 BM_eigen_i0e_double/32k 385936 385943 1801 BM_eigen_i0e_double/256k 3293324 3293551 228 BM_eigen_i0e_double/1M 12423600 12424458 57 BM_eigen_i0e_float/1 16.3 16.3 43038042 BM_eigen_i0e_float/8 30.1 30.1 23456931 BM_eigen_i0e_float/64 169 169 4132875 BM_eigen_i0e_float/512 1338 1339 516860 BM_eigen_i0e_float/4k 10191 10191 68513 BM_eigen_i0e_float/32k 81338 81337 8531 BM_eigen_i0e_float/256k 651807 651984 1000 BM_eigen_i0e_float/1M 2633821 2634187 268 BM_eigen_i1e_double/1 16.2 16.2 42352499 BM_eigen_i1e_double/8 110 110 6316524 BM_eigen_i1e_double/64 822 822 851065 BM_eigen_i1e_double/512 6480 6481 100000 BM_eigen_i1e_double/4k 51843 51843 10000 BM_eigen_i1e_double/32k 414854 414852 1680 BM_eigen_i1e_double/256k 3320001 3320568 212 BM_eigen_i1e_double/1M 13442795 13442391 53 BM_eigen_i1e_float/1 17.6 17.6 41025735 BM_eigen_i1e_float/8 35.5 35.5 19597891 BM_eigen_i1e_float/64 240 240 2924237 BM_eigen_i1e_float/512 1424 1424 485953 BM_eigen_i1e_float/4k 10722 10723 65162 BM_eigen_i1e_float/32k 86286 86297 8048 BM_eigen_i1e_float/256k 691821 691868 1000 BM_eigen_i1e_float/1M 2777336 2777747 256 This shows anywhere from a 50% to 75% improvement on these operations. I've also benchmarked without any of these flags turned on, and got similar performance to before (if not better). Also tested packetmath.cpp + special_functions to ensure no regressions.
* Merged eigen/eigen into defaultGravatar Srinivas Vasudevan2019-09-11
|\
| * Fix for the HIP build+test errors introduced by the ndtri support.Gravatar Deven Desai2019-09-06
| | | | | | | | | | | | | | The fixes needed are * adding EIGEN_DEVICE_FUNC attribute to a couple of funcs (else HIPCC will error out when non-device funcs are called from global/device funcs) * switching to using ::<math_func> instead std::<math_func> (only for HIPCC) in cases where the std::<math_func> is not recognized as a device func by HIPCC * removing an errant "j" from a testcase (don't know how that made it in to begin with!)
| * bug #1736: fix compilation issue with A(all,{1,2}).col(j) by implementing ↵Gravatar Gael Guennebaud2019-09-11
| | | | | | | | true compile-time "if" for block_evaluator<>::coeff(i)/coeffRef(i)
| * bug #1741: fix self-adjoint*matrix, triangular*matrix, and ↵Gravatar Gael Guennebaud2019-09-11
| | | | | | | | triangular^1*matrix with a destination having a non-trivial inner-stride
| * Fix compilation of BLAS backend and frontendGravatar Gael Guennebaud2019-09-11
| |
| * Merged in ezhulenev/eigen-01 (pull request PR-698)Gravatar Rasmus Larsen2019-09-10
| |\ | | | | | | | | | | | | | | | ThreadLocal container that does not rely on thread local storage Approved-by: Rasmus Larsen <rmlarsen@google.com>
| | * Update ThreadLocal to use separate Initialize/Release callablesGravatar Eugene Zhulenev2019-09-10
| | |
| * | Fix some implicit literal to Scalar conversions in SparseCoreGravatar Gael Guennebaud2019-09-11
| | |
| * | bug #1741: fix SelfAdjointView::rankUpdate and product to triangular part ↵Gravatar Gael Guennebaud2019-09-10
| | | | | | | | | | | | for destination with non-trivial inner stride
| * | bug #1741: fix C.noalias() = A*C; with C.innerStride()!=1Gravatar Gael Guennebaud2019-09-10
| | |
| | * ThreadLocal container that does not rely on thread local storageGravatar Eugene Zhulenev2019-09-09
| |/
| * Fix a circular dependency regarding pshift* functions and ↵Gravatar Gael Guennebaud2019-09-06
| | | | | | | | | | | | | | GenericPacketMathFunctions. Another solution would have been to make pshift* fully generic template functions with partial specialization which is always a mess in c++03.
| * Fix compilation without vector engine available (e.g., x86 with SSE disabled):Gravatar Gael Guennebaud2019-09-05
| | | | | | | | -> ppolevl is required by ndtri even for the scalar path
* | Merged eigen/eigenGravatar Srinivas Vasudevan2019-09-04
|\ \
* \ \ Merging from eigen/eigen.Gravatar Srinivas Vasudevan2019-09-03
|\ \ \
* | | | Add ndtri function, the inverse of the normal distribution function.Gravatar Srinivas Vasudevan2019-08-12
| | | |
| | | * PR 621: Fix documentation of EIGEN_COMP_EMSCRIPTENGravatar David Tellenbach2019-03-21
| | |/
| | * Fix doc issues regarding ndtriGravatar Gael Guennebaud2019-09-04
| | |
| | * Fix possible warning regarding strict equality comparisonsGravatar Gael Guennebaud2019-09-04
| | |
| | * PR 681: Add ndtri function, the inverse of the normal distribution function.Gravatar Srinivas Vasudevan2019-08-12
| | |
| | * Change typedefs from private to protected to fix MSVC compilationGravatar Eugene Zhulenev2019-09-03
| | |
| | * Allow move-only done callback in TensorAsyncDeviceGravatar Eugene Zhulenev2019-09-03
| |/
| * Add test for const TensorMap underlying data mutationGravatar Eugene Zhulenev2019-09-03
| |
| * TensorMap constness should not change underlying storage constnessGravatar Eugene Zhulenev2019-09-03
| |
| * Makes Scalar/RealScalar typedefs public in Pardiso's wrappers (see PR 688)Gravatar Gael Guennebaud2019-09-03
| |
| * Fixed Tensor documentation formatting.Gravatar Alberto Luaces2019-07-23
| |
| * More colamd cleanup:Gravatar Gael Guennebaud2019-09-03
| | | | | | | | | | | | - Move colamd implementation in its own namespace to avoid polluting the internal namespace with Ok, Status, etc. - Fix signed/unsigned warning - move some ugly free functions as member functions
| * Eigen_Colamd.h updated to replace constexpr with consts and enums.Gravatar Anshul Jaiswal2019-08-17
| |
| * Ordering.h edited to fix dependencies on Eigen_Colamd.hGravatar Anshul Jaiswal2019-08-15
| |
| * Eigen_Colamd.h edited replacing macros with constexprs and functions.Gravatar Anshul Jaiswal2019-08-15
| |
| * Eigen_Colamd.h edited online with Bitbucket replacing constant #defines with ↵Gravatar Anshul Jaiswal2019-07-21
| | | | | | | | const definitions
| * Updated Eigen_Colamd.h, namespacing macros ALIVE & DEAD as COLAMD_ALIVE & ↵Gravatar Anshul Jaiswal2019-06-08
| | | | | | | | | | | | COLAMD_DEAD to prevent conflicts with other libraries / code.
| * Fix shadow warnings in TensorContractionThreadPoolGravatar Eugene Zhulenev2019-08-30
| |
| * Fix block mapper type name in TensorExecutorGravatar Eugene Zhulenev2019-08-30
| |
| * evalSubExprsIfNeededAsync + async TensorContractionThreadPoolGravatar Eugene Zhulenev2019-08-30
| |
| * Revert accidentally removed <memory> header from ThreadPoolGravatar Eugene Zhulenev2019-08-30
| |
| * Asynchronous expression evaluation with TensorAsyncDeviceGravatar Eugene Zhulenev2019-08-30
| |
| * Fix missing header inclusion and colliding definitions for half type ↵Gravatar Rasmus Munk Larsen2019-08-30
| | | | | | | | | | | | casting, which broke build with -march=native on Haswell/Skylake.
| * Const correctness in TensorMap<const Tensor<T, ...>> expressionsGravatar Eugene Zhulenev2019-08-28
| |
| * Add more tests for corner cases of log1p and expm1. Add handling of infinite ↵Gravatar Rasmus Munk Larsen2019-08-28
| | | | | | | | arguments to log1p such that log1p(inf) = inf.
| * Remove shadow warnings in TensorDeviceThreadPoolGravatar Eugene Zhulenev2019-08-28
| |
| * Revert changes to std_falback::log1p that broke handling of arguments less ↵Gravatar Rasmus Munk Larsen2019-08-27
| | | | | | | | than -1. Fix packet op accordingly.
| * Clean up float16 a.k.a. Eigen::half support in Eigen. Move the definition of ↵Gravatar Rasmus Munk Larsen2019-08-27
| | | | | | | | half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories.
| * Merged in ezhulenev/eigen-01 (pull request PR-683)Gravatar Rasmus Larsen2019-08-26
| |\ | | | | | | | | | Asynchronous parallelFor in Eigen ThreadPoolDevice
| * | Fix get_random_seed on Native ClientGravatar maratek2019-08-23
| | | | | | | | | | | | | | | Newlib in Native Client SDK does not provide ::random function. Implement get_random_seed for NaCl using ::rand, similarly to Windows version.
| | * Asynchronous parallelFor in Eigen ThreadPoolDeviceGravatar Eugene Zhulenev2019-08-22
| |/
| * Merged in jaopaulolc/eigen (pull request PR-679)Gravatar Christoph Hertzberg2019-08-22
| |\ | | | | | | | | | Fixes for Altivec/VSX and compilation with clang on PowerPC
| * \ Merged in rmlarsen/eigen (pull request PR-680)Gravatar Rasmus Larsen2019-08-22
| |\ \ | | | | | | | | | | | | Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments.