eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Prevent infinite loop in the nvcc compiler while unrolling the recurrent ↵	Rasmus Munk Larsen	2019-10-01
\| \| \| \|	templates for Chebyshev polynomial evaluation.
*	Fix perf issue in SimplicialLDLT::solve for complexes (again, m_diag is real)	Gael Guennebaud	2019-10-01
\|
*	Fix speed issue with SimplicialLDLT for complexes: the diagonal is real!	Gael Guennebaud	2019-09-30
\|
*	Move implementation of vectorized error function erf() to ↵	Rasmus Munk Larsen	2019-09-27
\| \| \| \|	SpecialFunctionsImpl.h.
*	Fix erf in c++03	Eugene Zhulenev	2019-09-25
\|
*	Fix for the HIP build+test errors.	Deven Desai	2019-09-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The errors were introduced by this commit : https://bitbucket.org/eigen/eigen/commits/d38e6fbc27abe0c354ffe90928f6741c378e76e1 After the above mentioned commit, some of the tests started failing with the following error ``` Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_reduction_gpu_5.dir/cxx11_tensor_reduction_gpu_5_generated_cxx11_tensor_reduction_gpu.cu.o In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:70: /home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsHalf.h:28:22: error: call to 'erf' is ambiguous return Eigen::half(Eigen::numext::erf(static_cast<float>(a))); ^~~~~~~~~~~~~~~~~~ /home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1600:7: note: candidate function [with T = float] float erf(const float &x) { return ::erff(x); } ^ /home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = float] erf(const Scalar& x) { ^ In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75: /home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:23: error: call to 'erf' is ambiguous return make_double2(erf(a.x), erf(a.y)); ^~~ /home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double] double erf(const double &x) { return ::erf(x); } ^ /home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double] erf(const Scalar& x) { ^ In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75: /home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:33: error: call to 'erf' is ambiguous return make_double2(erf(a.x), erf(a.y)); ^~~ /home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double] double erf(const double &x) { return ::erf(x); } ^ /home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double] erf(const Scalar& x) { ^ 3 errors generated. ``` This PR fixes the compile error by removing the "old" implementation for "erf" (assuming that the "new" implementation is what we want going forward. from a GPU point-of-view both implementations are the same). This PR also fixes what seems like a cut-n-paste error in the aforementioned commit
*	Merged in rmlarsen/eigen (pull request PR-704)	Rasmus Larsen	2019-09-24
\|\ \| \| \| \| \| \|	Add generic PacketMath implementation of the Error Function (erf).
* \|	Tensor block evaluation V2 support for unary/binary/broadcsting	Eugene Zhulenev	2019-09-24
\| \|
* \|	bug #1746: Removed implementation of standard copy-constructor and standard ↵	Christoph Hertzberg	2019-09-24
\| \| \| \| \| \| \| \|	copy-assign-operator from PermutationMatrix and Transpositions to allow malloc-less std::move. Added unit-test to rvalue_types
\| *	Add generic PacketMath implementation of the Error Function (erf).	Rasmus Munk Larsen	2019-09-19
\| \|
* \|	Fix build on setups without AVX512DQ.	Rasmus Munk Larsen	2019-09-19
\|/
*	Fix for the HIP build+test errors.	Deven Desai	2019-09-18
\| \| \| \| \| \| \|	The errors were introduced by this commit : https://bitbucket.org/eigen/eigen/commits/6e215cf109073da9ffb5b491171613b8db24fd9d The fix is switching to using ::<math_func> instead std::<math_func> when compiling for GPU
*	Add Bessel functions to SpecialFunctions.	Srinivas Vasudevan	2019-09-14
\| \| \| \| \| \| \| \| \|	- Split SpecialFunctions files in to a separate BesselFunctions file. In particular add: - Modified bessel functions of the second kind k0, k1, k0e, k1e - Bessel functions of the first kind j0, j1 - Bessel functions of the second kind y0, y1
*	Add packetized versions of i0e and i1e special functions.	Srinivas Vasudevan	2019-09-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- In particular refactor the i0e and i1e code so scalar and vectorized path share code. - Move chebevl to GenericPacketMathFunctions. A brief benchmark with building Eigen with FMA, AVX and AVX2 flags Before: CPU: Intel Haswell with HyperThreading (6 cores) Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- BM_eigen_i0e_double/1 57.3 57.3 10000000 BM_eigen_i0e_double/8 398 398 1748554 BM_eigen_i0e_double/64 3184 3184 218961 BM_eigen_i0e_double/512 25579 25579 27330 BM_eigen_i0e_double/4k 205043 205042 3418 BM_eigen_i0e_double/32k 1646038 1646176 422 BM_eigen_i0e_double/256k 13180959 13182613 53 BM_eigen_i0e_double/1M 52684617 52706132 10 BM_eigen_i0e_float/1 28.4 28.4 24636711 BM_eigen_i0e_float/8 75.7 75.7 9207634 BM_eigen_i0e_float/64 512 512 1000000 BM_eigen_i0e_float/512 4194 4194 166359 BM_eigen_i0e_float/4k 32756 32761 21373 BM_eigen_i0e_float/32k 261133 261153 2678 BM_eigen_i0e_float/256k 2087938 2088231 333 BM_eigen_i0e_float/1M 8380409 8381234 84 BM_eigen_i1e_double/1 56.3 56.3 10000000 BM_eigen_i1e_double/8 397 397 1772376 BM_eigen_i1e_double/64 3114 3115 223881 BM_eigen_i1e_double/512 25358 25361 27761 BM_eigen_i1e_double/4k 203543 203593 3462 BM_eigen_i1e_double/32k 1613649 1613803 428 BM_eigen_i1e_double/256k 12910625 12910374 54 BM_eigen_i1e_double/1M 51723824 51723991 10 BM_eigen_i1e_float/1 28.3 28.3 24683049 BM_eigen_i1e_float/8 74.8 74.9 9366216 BM_eigen_i1e_float/64 505 505 1000000 BM_eigen_i1e_float/512 4068 4068 171690 BM_eigen_i1e_float/4k 31803 31806 21948 BM_eigen_i1e_float/32k 253637 253692 2763 BM_eigen_i1e_float/256k 2019711 2019918 346 BM_eigen_i1e_float/1M 8238681 8238713 86 After: CPU: Intel Haswell with HyperThreading (6 cores) Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- BM_eigen_i0e_double/1 15.8 15.8 44097476 BM_eigen_i0e_double/8 99.3 99.3 7014884 BM_eigen_i0e_double/64 777 777 886612 BM_eigen_i0e_double/512 6180 6181 100000 BM_eigen_i0e_double/4k 48136 48140 14678 BM_eigen_i0e_double/32k 385936 385943 1801 BM_eigen_i0e_double/256k 3293324 3293551 228 BM_eigen_i0e_double/1M 12423600 12424458 57 BM_eigen_i0e_float/1 16.3 16.3 43038042 BM_eigen_i0e_float/8 30.1 30.1 23456931 BM_eigen_i0e_float/64 169 169 4132875 BM_eigen_i0e_float/512 1338 1339 516860 BM_eigen_i0e_float/4k 10191 10191 68513 BM_eigen_i0e_float/32k 81338 81337 8531 BM_eigen_i0e_float/256k 651807 651984 1000 BM_eigen_i0e_float/1M 2633821 2634187 268 BM_eigen_i1e_double/1 16.2 16.2 42352499 BM_eigen_i1e_double/8 110 110 6316524 BM_eigen_i1e_double/64 822 822 851065 BM_eigen_i1e_double/512 6480 6481 100000 BM_eigen_i1e_double/4k 51843 51843 10000 BM_eigen_i1e_double/32k 414854 414852 1680 BM_eigen_i1e_double/256k 3320001 3320568 212 BM_eigen_i1e_double/1M 13442795 13442391 53 BM_eigen_i1e_float/1 17.6 17.6 41025735 BM_eigen_i1e_float/8 35.5 35.5 19597891 BM_eigen_i1e_float/64 240 240 2924237 BM_eigen_i1e_float/512 1424 1424 485953 BM_eigen_i1e_float/4k 10722 10723 65162 BM_eigen_i1e_float/32k 86286 86297 8048 BM_eigen_i1e_float/256k 691821 691868 1000 BM_eigen_i1e_float/1M 2777336 2777747 256 This shows anywhere from a 50% to 75% improvement on these operations. I've also benchmarked without any of these flags turned on, and got similar performance to before (if not better). Also tested packetmath.cpp + special_functions to ensure no regressions.
*	Merged eigen/eigen into default	Srinivas Vasudevan	2019-09-11
\|\
\| *	Fix for the HIP build+test errors introduced by the ndtri support.	Deven Desai	2019-09-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fixes needed are * adding EIGEN_DEVICE_FUNC attribute to a couple of funcs (else HIPCC will error out when non-device funcs are called from global/device funcs) * switching to using ::<math_func> instead std::<math_func> (only for HIPCC) in cases where the std::<math_func> is not recognized as a device func by HIPCC * removing an errant "j" from a testcase (don't know how that made it in to begin with!)
\| *	bug #1736: fix compilation issue with A(all,{1,2}).col(j) by implementing ↵	Gael Guennebaud	2019-09-11
\| \| \| \| \| \| \| \|	true compile-time "if" for block_evaluator<>::coeff(i)/coeffRef(i)
\| *	bug #1741: fix self-adjointmatrix, triangularmatrix, and ↵	Gael Guennebaud	2019-09-11
\| \| \| \| \| \| \| \|	triangular^1*matrix with a destination having a non-trivial inner-stride
\| *	Fix compilation of BLAS backend and frontend	Gael Guennebaud	2019-09-11
\| \|
\| *	Fix some implicit literal to Scalar conversions in SparseCore	Gael Guennebaud	2019-09-11
\| \|
\| *	bug #1741: fix SelfAdjointView::rankUpdate and product to triangular part ↵	Gael Guennebaud	2019-09-10
\| \| \| \| \| \| \| \|	for destination with non-trivial inner stride
\| *	bug #1741: fix C.noalias() = A*C; with C.innerStride()!=1	Gael Guennebaud	2019-09-10
\| \|
\| *	Fix a circular dependency regarding pshift* functions and ↵	Gael Guennebaud	2019-09-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	GenericPacketMathFunctions. Another solution would have been to make pshift* fully generic template functions with partial specialization which is always a mess in c++03.
\| *	Fix compilation without vector engine available (e.g., x86 with SSE disabled):	Gael Guennebaud	2019-09-05
\| \| \| \| \| \| \| \|	-> ppolevl is required by ndtri even for the scalar path
* \|	Merged eigen/eigen	Srinivas Vasudevan	2019-09-04
\|\ \
* \ \	Merging from eigen/eigen.	Srinivas Vasudevan	2019-09-03
\|\ \ \
* \| \| \|	Add ndtri function, the inverse of the normal distribution function.	Srinivas Vasudevan	2019-08-12
\| \| \| \|
\| \| \| *	PR 621: Fix documentation of EIGEN_COMP_EMSCRIPTEN	David Tellenbach	2019-03-21
\| \| \|/
\| \| *	Fix doc issues regarding ndtri	Gael Guennebaud	2019-09-04
\| \| \|
\| \| *	Fix possible warning regarding strict equality comparisons	Gael Guennebaud	2019-09-04
\| \| \|
\| \| *	PR 681: Add ndtri function, the inverse of the normal distribution function.	Srinivas Vasudevan	2019-08-12
\| \| \|
\| \| *	Change typedefs from private to protected to fix MSVC compilation	Eugene Zhulenev	2019-09-03
\| \|/
\| *	Makes Scalar/RealScalar typedefs public in Pardiso's wrappers (see PR 688)	Gael Guennebaud	2019-09-03
\| \|
\| *	More colamd cleanup:	Gael Guennebaud	2019-09-03
\| \| \| \| \| \| \| \| \| \| \| \|	- Move colamd implementation in its own namespace to avoid polluting the internal namespace with Ok, Status, etc. - Fix signed/unsigned warning - move some ugly free functions as member functions
\| *	Eigen_Colamd.h updated to replace constexpr with consts and enums.	Anshul Jaiswal	2019-08-17
\| \|
\| *	Ordering.h edited to fix dependencies on Eigen_Colamd.h	Anshul Jaiswal	2019-08-15
\| \|
\| *	Eigen_Colamd.h edited replacing macros with constexprs and functions.	Anshul Jaiswal	2019-08-15
\| \|
\| *	Eigen_Colamd.h edited online with Bitbucket replacing constant #defines with ↵	Anshul Jaiswal	2019-07-21
\| \| \| \| \| \| \| \|	const definitions
\| *	Updated Eigen_Colamd.h, namespacing macros ALIVE & DEAD as COLAMD_ALIVE & ↵	Anshul Jaiswal	2019-06-08
\| \| \| \| \| \| \| \| \| \| \| \|	COLAMD_DEAD to prevent conflicts with other libraries / code.
\| *	Fix missing header inclusion and colliding definitions for half type ↵	Rasmus Munk Larsen	2019-08-30
\| \| \| \| \| \| \| \| \| \| \| \|	casting, which broke build with -march=native on Haswell/Skylake.
\| *	Add more tests for corner cases of log1p and expm1. Add handling of infinite ↵	Rasmus Munk Larsen	2019-08-28
\| \| \| \| \| \| \| \|	arguments to log1p such that log1p(inf) = inf.
\| *	Revert changes to std_falback::log1p that broke handling of arguments less ↵	Rasmus Munk Larsen	2019-08-27
\| \| \| \| \| \| \| \|	than -1. Fix packet op accordingly.
\| *	Clean up float16 a.k.a. Eigen::half support in Eigen. Move the definition of ↵	Rasmus Munk Larsen	2019-08-27
\| \| \| \| \| \| \| \|	half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories.
\| *	Merged in jaopaulolc/eigen (pull request PR-679)	Christoph Hertzberg	2019-08-22
\| \|\ \| \| \| \| \| \| \| \| \|	Fixes for Altivec/VSX and compilation with clang on PowerPC
\| \| *	Fix debug macros in p{load,store}u	João P. L. de Carvalho	2019-08-14
\| \| \|
\| \| *	Add missing pcmp_XX methods for double/Packet2d	João P. L. de Carvalho	2019-08-14
\| \| \| \| \| \| \| \| \| \| \| \|	This actually fixes an issue in unit-test packetmath_2 with pcmp_eq when it is compiled with clang. When pcmp_eq(Packet4f,Packet4f) is used instead of pcmp_eq(Packet2d,Packet2d), the unit-test does not pass due to NaN on ref vector.
\| * \|	Implement vectorized versions of log1p and expm1 in Eigen using Kahan's ↵	Rasmus Munk Larsen	2019-08-12
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	formulas, and change the scalar implementations to properly handle infinite arguments. Depending on instruction set, significant speedups are observed for the vectorized path: log1p wall time is reduced 60-93% (2.5x - 15x speedup) expm1 wall time is reduced 0-85% (1x - 7x speedup) The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly. Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM
\| *	Fix packed load/store for PowerPC's VSX	João P. L. de Carvalho	2019-08-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The vec_vsx_ld/vec_vsx_st builtins were wrongly used for aligned load/store. In fact, they perform unaligned memory access and, even when the address is 16-byte aligned, they are much slower (at least 2x) than their aligned counterparts. For double/Packet2d vec_xl/vec_xst should be prefered over vec_ld/vec_st, although the latter works when casted to float/Packet4f. Silencing some weird warning with throw but some GCC versions. Such warning are not thrown by Clang.
\| *	Fix offset argument of ploadu/pstoreu for Altivec	João P. L. de Carvalho	2019-08-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If no offset is given, them it should be zero. Also passes full address to vec_vsx_ld/st builtins. Removes userless _EIGEN_ALIGNED_PTR & _EIGEN_MASK_ALIGNMENT. Removes unnecessary casts.
\| *	bug #1718: Add cast to successfully compile with clang on PowerPC	João P. L. de Carvalho	2019-08-09
\|/ \| \| \|	Ignoring -Wc11-extensions warnings thrown by clang at Altivec/PacketMath.h