eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Remove legacy block evaluation support	Eugene Zhulenev	2019-11-12
\|
*	Fix a race in async tensor evaluation: Don't run on_done() until after ↵	Rasmus Munk Larsen	2019-11-11
\| \| \| \|	device.deallocate() / evaluator.cleanup() complete, since the device might be destroyed after on_done() runs.
*	Break loop dependence in TensorGenerator block access	Eugene Zhulenev	2019-11-11
\|
*	Fix data race in css11_tensor_notification test.	Rasmus Munk Larsen	2019-11-08
\|
*	Add EIGEN_HAS_INTRINSIC_INT128 macro	Rasmus Munk Larsen	2019-11-06
\| \| \| \|	Add a new EIGEN_HAS_INTRINSIC_INT128 macro, and use this instead of __SIZEOF_INT128__. This fixes related issues with TensorIntDiv.h when building with Clang for Windows, where support for 128-bit integer arithmetic is advertised but broken in practice.
*	Rollback or PR-746 and partial rollback of ↵	Rasmus Munk Larsen	2019-11-05
\| \| \| \| \| \| \| \|	https://bitbucket.org/eigen/eigen/commits/668ab3fc474e54c7919eda4fbaf11f3a99246494 . std::array is still not supported in CUDA device code on Windows.
*	Merged in ezhulenev/eigen-01 (pull request PR-746)	Rasmus Larsen	2019-11-04
\|\ \| \| \| \| \| \|	Remove internal::smart_copy and replace with std::copy
* \|	Cleanup includes in Tensor module after switch to C++11 and above	Eugene Zhulenev	2019-10-29
\| \|
\| *	Remove internal::smart_copy and replace with std::copy	Eugene Zhulenev	2019-10-29
\|/
*	Fix CXX11Meta compilation with MSVC	Eugene Zhulenev	2019-10-28
\|
*	Prevent potential ODR in TensorExecutor	Eugene Zhulenev	2019-10-28
\|
*	This PR fixes:	Mehdi Goli	2019-10-23
\| \| \| \| \|	* The specialization of array class in the different namespace for GCC<=6.4 * The implicit call to `std::array` constructor using the initializer list for GCC <=6.1
*	Merged in deven-amd/eigen-hip-fix-191018 (pull request PR-738)	Rasmus Larsen	2019-10-22
\|\ \| \| \| \| \| \|	Fix for the HIP build+test errors.
* \|	Add block evaluation V2 to TensorAsyncExecutor.	Rasmus Munk Larsen	2019-10-22
\| \| \| \| \| \| \| \|	Add async evaluation to a number of ops.
\| *	Fix for the HIP build+test errors.	Deven Desai	2019-10-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The errors were introduced by this commit : After the above mentioned commit, some of the tests started failing with the following error ``` Built target cxx11_tensor_reduction Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_reduction_gpu_5.dir/cxx11_tensor_reduction_gpu_5_generated_cxx11_tensor_reduction_gpu.cu.o In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:117: /home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlockV2.h:155:5: error: the field type is not amp-compatible DestinationBufferKind m_kind; ^ /home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlockV2.h:211:3: error: the field type is not amp-compatible DestinationBuffer m_destination; ^ ``` For some reason HIPCC does not like device code to contain enum types which do not have the base-type explicitly declared. The fix is trivial, explicitly state "int" as the basetype
* \|	Drop support for c++03 in Eigen tensor. Get rid of some code used to emulate ↵	Rasmus Munk Larsen	2019-10-18
\|/ \| \| \|	c++11 functionality with older compilers.
*	Propagate block evaluation preference through rvalue tensor expressions	Eugene Zhulenev	2019-10-17
\|
*	Cleanup Tensor block destination and materialized block storage allocation	Eugene Zhulenev	2019-10-16
\|
*	TensorBroadcasting support for random/uniform blocks	Eugene Zhulenev	2019-10-16
\|
*	Block evaluation for TensorGenerator/TensorReverse/TensorShuffling	Eugene Zhulenev	2019-10-14
\|
*	bug #1747: fix compilation with MSVC	Gael Guennebaud	2019-10-14
\|
*	Block evaluation for TensorGenerator + TensorReverse + fixed bug in tensor ↵	Eugene Zhulenev	2019-10-10
\| \| \| \|	reverse op
*	Block evaluation for TensorChipping + fixed bugs in TensorPadding and ↵	Eugene Zhulenev	2019-10-09
\| \| \| \|	TensorSlicing
*	Implement c++03 compatible fix for changeset ↵	Gael Guennebaud	2019-10-09
\| \| \| \|	7a43af1a335da2c0489b4119a33ee1cbff0c15d6
*	Fix compilation of FFTW unit test	Gael Guennebaud	2019-10-08
\|
*	Add block evaluation to TensorEvalTo and fix few small bugs	Eugene Zhulenev	2019-10-07
\|
*	Fixing incorrect size in Tensor documentation.	Brian Zhao	2019-10-04
\|
*	Use "pdiv" rather than operator/ to support packet types.	Rasmus Munk Larsen	2019-10-04
\|
*	Fix compilation warnings and errors with clang in TensorBlockV2 code and tests	Eugene Zhulenev	2019-10-04
\|
*	Add block evaluation to TensorReshaping/TensorCasting/TensorPadding/TensorSelect	Eugene Zhulenev	2019-10-02
\|
*	Add beta to TensorContractionKernel and make memset optional	Eugene Zhulenev	2019-10-02
\|
*	Move implementation of vectorized error function erf() to ↵	Rasmus Munk Larsen	2019-09-27
\| \| \| \|	SpecialFunctionsImpl.h.
*	Fix cxx11_tensor_block_io test	Eugene Zhulenev	2019-09-25
\|
*	Fix compilation warnings and errors with clang in TensorBlockV2	Eugene Zhulenev	2019-09-25
\|
*	Fix for the HIP build+test errors.	Deven Desai	2019-09-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The errors were introduced by this commit : https://bitbucket.org/eigen/eigen/commits/d38e6fbc27abe0c354ffe90928f6741c378e76e1 After the above mentioned commit, some of the tests started failing with the following error ``` Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_reduction_gpu_5.dir/cxx11_tensor_reduction_gpu_5_generated_cxx11_tensor_reduction_gpu.cu.o In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:70: /home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsHalf.h:28:22: error: call to 'erf' is ambiguous return Eigen::half(Eigen::numext::erf(static_cast<float>(a))); ^~~~~~~~~~~~~~~~~~ /home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1600:7: note: candidate function [with T = float] float erf(const float &x) { return ::erff(x); } ^ /home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = float] erf(const Scalar& x) { ^ In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75: /home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:23: error: call to 'erf' is ambiguous return make_double2(erf(a.x), erf(a.y)); ^~~ /home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double] double erf(const double &x) { return ::erf(x); } ^ /home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double] erf(const Scalar& x) { ^ In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75: /home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:33: error: call to 'erf' is ambiguous return make_double2(erf(a.x), erf(a.y)); ^~~ /home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double] double erf(const double &x) { return ::erf(x); } ^ /home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double] erf(const Scalar& x) { ^ 3 errors generated. ``` This PR fixes the compile error by removing the "old" implementation for "erf" (assuming that the "new" implementation is what we want going forward. from a GPU point-of-view both implementations are the same). This PR also fixes what seems like a cut-n-paste error in the aforementioned commit
*	Fix a bug in a packed block type in TensorContractionThreadPool	Eugene Zhulenev	2019-09-24
\|
*	Merged in rmlarsen/eigen (pull request PR-704)	Rasmus Larsen	2019-09-24
\|\ \| \| \| \| \| \|	Add generic PacketMath implementation of the Error Function (erf).
\| *	Add TODO to cleanup FMA cost modelling.	Rasmus Munk Larsen	2019-09-24
\| \|
* \|	Choose TensorBlock StridedLinearCopy type statically	Eugene Zhulenev	2019-09-24
\| \|
* \|	Add new TensorBlock api implementation + tests	Eugene Zhulenev	2019-09-24
\| \|
* \|	Tensor block evaluation V2 support for unary/binary/broadcsting	Eugene Zhulenev	2019-09-24
\| \|
* \|	Fix implicit conversion warnings and use pnegate to negate packets	Christoph Hertzberg	2019-09-23
\| \|
* \|	Fix (or mask away) conversion warnings introduced in ↵	Christoph Hertzberg	2019-09-23
\| \| \| \| \| \| \| \| \| \| \| \|	553caeb6a3bb545aef895f8fc9f219be44679017 .
* \|	Add support for asynchronous evaluation of tensor casting expressions.	Rasmus Munk Larsen	2019-09-19
\| \|
\| *	Add generic PacketMath implementation of the Error Function (erf).	Rasmus Munk Larsen	2019-09-19
\|/
*	Merging eigen/eigen.	Srinivas Vasudevan	2019-09-16
\|\
* \|	Add Bessel functions to SpecialFunctions.	Srinivas Vasudevan	2019-09-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Split SpecialFunctions files in to a separate BesselFunctions file. In particular add: - Modified bessel functions of the second kind k0, k1, k0e, k1e - Bessel functions of the first kind j0, j1 - Bessel functions of the second kind y0, y1
\| *	Fix maybe-unitialized warnings in TensorContractionThreadPool	Eugene Zhulenev	2019-09-13
\| \|
\| *	Use ThreadLocal container in TensorContractionThreadPool	Eugene Zhulenev	2019-09-13
\|/
*	Add packetized versions of i0e and i1e special functions.	Srinivas Vasudevan	2019-09-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- In particular refactor the i0e and i1e code so scalar and vectorized path share code. - Move chebevl to GenericPacketMathFunctions. A brief benchmark with building Eigen with FMA, AVX and AVX2 flags Before: CPU: Intel Haswell with HyperThreading (6 cores) Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- BM_eigen_i0e_double/1 57.3 57.3 10000000 BM_eigen_i0e_double/8 398 398 1748554 BM_eigen_i0e_double/64 3184 3184 218961 BM_eigen_i0e_double/512 25579 25579 27330 BM_eigen_i0e_double/4k 205043 205042 3418 BM_eigen_i0e_double/32k 1646038 1646176 422 BM_eigen_i0e_double/256k 13180959 13182613 53 BM_eigen_i0e_double/1M 52684617 52706132 10 BM_eigen_i0e_float/1 28.4 28.4 24636711 BM_eigen_i0e_float/8 75.7 75.7 9207634 BM_eigen_i0e_float/64 512 512 1000000 BM_eigen_i0e_float/512 4194 4194 166359 BM_eigen_i0e_float/4k 32756 32761 21373 BM_eigen_i0e_float/32k 261133 261153 2678 BM_eigen_i0e_float/256k 2087938 2088231 333 BM_eigen_i0e_float/1M 8380409 8381234 84 BM_eigen_i1e_double/1 56.3 56.3 10000000 BM_eigen_i1e_double/8 397 397 1772376 BM_eigen_i1e_double/64 3114 3115 223881 BM_eigen_i1e_double/512 25358 25361 27761 BM_eigen_i1e_double/4k 203543 203593 3462 BM_eigen_i1e_double/32k 1613649 1613803 428 BM_eigen_i1e_double/256k 12910625 12910374 54 BM_eigen_i1e_double/1M 51723824 51723991 10 BM_eigen_i1e_float/1 28.3 28.3 24683049 BM_eigen_i1e_float/8 74.8 74.9 9366216 BM_eigen_i1e_float/64 505 505 1000000 BM_eigen_i1e_float/512 4068 4068 171690 BM_eigen_i1e_float/4k 31803 31806 21948 BM_eigen_i1e_float/32k 253637 253692 2763 BM_eigen_i1e_float/256k 2019711 2019918 346 BM_eigen_i1e_float/1M 8238681 8238713 86 After: CPU: Intel Haswell with HyperThreading (6 cores) Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- BM_eigen_i0e_double/1 15.8 15.8 44097476 BM_eigen_i0e_double/8 99.3 99.3 7014884 BM_eigen_i0e_double/64 777 777 886612 BM_eigen_i0e_double/512 6180 6181 100000 BM_eigen_i0e_double/4k 48136 48140 14678 BM_eigen_i0e_double/32k 385936 385943 1801 BM_eigen_i0e_double/256k 3293324 3293551 228 BM_eigen_i0e_double/1M 12423600 12424458 57 BM_eigen_i0e_float/1 16.3 16.3 43038042 BM_eigen_i0e_float/8 30.1 30.1 23456931 BM_eigen_i0e_float/64 169 169 4132875 BM_eigen_i0e_float/512 1338 1339 516860 BM_eigen_i0e_float/4k 10191 10191 68513 BM_eigen_i0e_float/32k 81338 81337 8531 BM_eigen_i0e_float/256k 651807 651984 1000 BM_eigen_i0e_float/1M 2633821 2634187 268 BM_eigen_i1e_double/1 16.2 16.2 42352499 BM_eigen_i1e_double/8 110 110 6316524 BM_eigen_i1e_double/64 822 822 851065 BM_eigen_i1e_double/512 6480 6481 100000 BM_eigen_i1e_double/4k 51843 51843 10000 BM_eigen_i1e_double/32k 414854 414852 1680 BM_eigen_i1e_double/256k 3320001 3320568 212 BM_eigen_i1e_double/1M 13442795 13442391 53 BM_eigen_i1e_float/1 17.6 17.6 41025735 BM_eigen_i1e_float/8 35.5 35.5 19597891 BM_eigen_i1e_float/64 240 240 2924237 BM_eigen_i1e_float/512 1424 1424 485953 BM_eigen_i1e_float/4k 10722 10723 65162 BM_eigen_i1e_float/32k 86286 86297 8048 BM_eigen_i1e_float/256k 691821 691868 1000 BM_eigen_i1e_float/1M 2777336 2777747 256 This shows anywhere from a 50% to 75% improvement on these operations. I've also benchmarked without any of these flags turned on, and got similar performance to before (if not better). Also tested packetmath.cpp + special_functions to ensure no regressions.