eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Fix tensor casts for large packets and casts to/from std::complex	Antonio Sanchez	2020-06-30
\| \| \| \| \| \| \| \| \| \| \| \| \|	The original tensor casts were only defined for `SrcCoeffRatio`:`TgtCoeffRatio` 1:1, 1:2, 2:1, 4:1. Here we add the missing 1:N and 8:1. We also add casting `Eigen::half` to/from `std::complex<T>`, which was missing to make it consistent with `Eigen:bfloat16`, and generalize the overload to work for any complex type. Tests were added to `basicstuff`, `packetmath`, and `cxx11_tensor_casts` to test all cast configurations.
*	Fix packetmath_1 float tests for arm/aarch64.	Antonio Sanchez	2020-06-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added missing `pmadd<Packet2f>` for NEON. This leads to significant improvement in precision than previous `pmul+padd`, which was causing the `pcos` tests to fail. Also added an approx test with `std::sin`/`std::cos` since otherwise returning any `a^2+b^2=1` would pass. Modified `log(denorm)` tests. Denorms are not always supported by all systems (returns `::min`), are always flushed to zero on 32-bit arm, and configurably flush to zero on sse/avx/aarch64. This leads to inconsistent results across different systems (i.e. `-inf` vs `nan`). Added a check for existence and exclude ARM. Removed logistic exactness test, since scalar and vectorized versions follow different code-paths due to differences in `pexp` and `pmadd`, which result in slightly different values. For example, exactness always fails on arm, aarch64, and altivec.
*	Add missing Packet2l/Packet2ul ops for NEON.	Antonio Sanchez	2020-06-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current multiply (`pmul`) and comparison operators (`pcmp_lt`, `pcmp_le`, `pcmp_eq`) are missing for packets `Packet2l` and `Packet2ul`. This leads to compile errors for the `packetmath.cpp` tests in clang. Here we add and test the missing ops. Tested: ``` $ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath $ adb push packetmath /data/local/tmp/ $ adb shell "/data/local/tmp/packetmath" $ arm-linux-gnueabihf-g++ -mfpu=neon -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath $ adb push packetmath /data/local/tmp/ $ adb shell "/data/local/tmp/packetmath" $ clang++ -target aarch64-linux-android21 -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath $ adb push packetmath /data/local/tmp/ $ adb shell "/data/local/tmp/packetmath" $ clang++ -target armv7-linux-android21 -static -mfpu=neon -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath $ adb push packetmath /data/local/tmp/ $ adb shell "/data/local/tmp/packetmath" ```
*	Added missing NEON pcasts, update packetmath tests.	Antonio Sanchez	2020-06-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The NEON `pcast` operators are all implemented and tested for existing packets. This requires adding a `pcast(a,b,c,d,e,f,g,h)` for casting between `int64_t` and `int8_t` in `GenericPacketMath.h`. Removed incorrect `HasHalfPacket` definition for NEON's `Packet2l`/`Packet2ul`. Adjustments were also made to the `packetmath` tests. These include - minor bug fixes for cast tests (i.e. 4:1 casts, only casting for packets that are vectorizable) - added 8:1 cast tests - random number generation - original had uninteresting 0 to 0 casts for many casts between floating-point and integers, and exhibited signed overflow undefined behavior Tested: ``` $ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_ALL=1' test/packetmath.cpp -o packetmath $ adb push packetmath /data/local/tmp/ $ adb shell "/data/local/tmp/packetmath" ```
*	Support BFloat16 in Eigen	Teng Lu	2020-06-20
\|
*	Fix pscatter and pgather for Altivec Complex double	Pedro Caldeira	2020-06-16
\|
*	Fix unused variable warning on Arm	David Tellenbach	2020-06-15
\|
*	Fix #1818: SparseLU: add methods nnzL() and nnzU()	Sebastien Boisvert	2020-06-11
\| \| \| \| \| \|	Now this compiles without errors: $ clang++ -I ../../ test_sparseLU.cpp -std=c++03
*	Fix #1911: add benchmark for move semantics with fixed-size matrix	Sebastien Boisvert	2020-06-11
\| \| \| \| \| \| \| \| \| \| \|	$ clang++ -O3 bench/bench_move_semantics.cpp -I. -std=c++11 \ -o bench_move_semantics $ ./bench_move_semantics float copy semantics: 1755.97 ms float move semantics: 55.063 ms double copy semantics: 2457.65 ms double move semantics: 55.034 ms
*	Remove HasCast and fix packetmath cast tests.	Antonio Sanchez	2020-06-11
\| \| \| \| \| \| \| \| \| \| \|	The use of the `packet_traits<>::HasCast` field is currently inconsistent with `type_casting_traits<>`, and is unused apart from within `test/packetmath.cpp`. In addition, those packetmath cast tests do not currently reflect how casts are performed in practice: they ignore the `SrcCoeffRatio` and `TgtCoeffRatio` fields, assuming a 1:1 ratio. Here we remove the unsed `HasCast`, and modify the packet cast tests to better reflect their usage.
*	Implement scalar_cmp_with_cast_op	ShengYang1	2020-06-09
\|
*	Fix static analyzer warning in SelfadjointProduct.h.	Rasmus Munk Larsen	2020-06-08
\| \| \| \|	Fix compiler warnings in GeneralBlockPanelKernel.h.
*	Update FindComputeCpp.cmake to fix build problems on Windows	Thales Sabino	2020-06-05
\| \| \| \| \|	- Use standard types in SYCL/PacketMath.h to avoid compilation problems on Windows - Add EIGEN_HAS_CONSTEXPR to cxx11_tensor_argmax_sycl.cpp to fix build problems on Windows
*	Fix broken packetmath test for logistic on Arm.	Rasmus Munk Larsen	2020-06-04
\|
*	Fix typo in previous update to generic predux_any.	Rasmus Munk Larsen	2020-06-04
\|
*	Avoid implicit float equality comparison in generic predux_any, but use ↵	Rasmus Munk Larsen	2020-06-04
\| \| \| \|	numext::not_equal_strict to avoid breaking builds that compile with -Werror=float-equal.
*	Fix compilation error in logistic packet op.	Rasmus Munk Larsen	2020-06-03
\|
*	Bug #1777: make the scalar and packet path consistent for the logistic ↵	Gael Guennebaud	2020-05-31
\| \| \| \|	function + respective unit test
*	Fix #1833: compilation issue of "array!=scalar" with c++20	Gael Guennebaud	2020-05-30
\|
*	Save one extra temporary when assigning a sparse product to a row-major ↵	Gael Guennebaud	2020-05-30
\| \| \| \|	sparse matrix
*	Add support for PacketBlock<Packet8s,4> and PacketBlock<Packet16uc,4> ↵	Kan Chen	2020-05-29
\| \| \| \|	ptranspose on NEON
*	Fix #1874: it works on both MSVC 2017 and other platforms.	Kan Chen	2020-05-21
\|
*	Add pscatter for Packet16{u}c (int8)	Pedro Caldeira	2020-05-20
\|
*	- Vectorizing MMA packing.	Everton Constantino	2020-05-19
\| \| \| \| \|	- Optimizing MMA kernel. - Adding PacketBlock store to blas_data_mapper.
*	Add newline at the end of StlIterators.h.	Rasmus Munk Larsen	2020-05-15
\|
*	Fix #1874: workaround MSVC 2017 compilation issue.	Gael Guennebaud	2020-05-15
\|
*	Add missing packet ops for bool, and make it pass the same packet op unit ↵	Rasmus Munk Larsen	2020-05-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tests as other arithmetic types. This change also contains a few minor cleanups: 1. Remove packet op pnot, which is not needed for anything other than pcmp_le_or_nan, which can be done in other ways. 2. Remove the "HasInsert" enum, which is no longer needed since we removed the corresponding packet ops. 3. Add faster pselect op for Packet4i when SSE4.1 is supported. Among other things, this makes the fast transposeInPlace() method available for Matrix<bool>. Run on ************** (72 X 2994 MHz CPUs); 2020-05-09T10:51:02.372347913-07:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------------- BM_TransposeInPlace<float>/4 9.77 9.77 71670320 BM_TransposeInPlace<float>/8 21.9 21.9 31929525 BM_TransposeInPlace<float>/16 66.6 66.6 10000000 BM_TransposeInPlace<float>/32 243 243 2879561 BM_TransposeInPlace<float>/59 844 844 829767 BM_TransposeInPlace<float>/64 933 933 750567 BM_TransposeInPlace<float>/128 3944 3945 177405 BM_TransposeInPlace<float>/256 16853 16853 41457 BM_TransposeInPlace<float>/512 204952 204968 3448 BM_TransposeInPlace<float>/1k 1053889 1053861 664 BM_TransposeInPlace<bool>/4 14.4 14.4 48637301 BM_TransposeInPlace<bool>/8 36.0 36.0 19370222 BM_TransposeInPlace<bool>/16 31.5 31.5 22178902 BM_TransposeInPlace<bool>/32 111 111 6272048 BM_TransposeInPlace<bool>/59 626 626 1000000 BM_TransposeInPlace<bool>/64 428 428 1632689 BM_TransposeInPlace<bool>/128 1677 1677 417377 BM_TransposeInPlace<bool>/256 7126 7126 96264 BM_TransposeInPlace<bool>/512 29021 29024 24165 BM_TransposeInPlace<bool>/1k 116321 116330 6068
*	Added support for reverse iterators for Vectorwise operations.	Felipe Attanasio	2020-05-14
\|
*	Indexed view should have RowMajorBit when there is staticly a single row	Christopher Moore	2020-05-14
\|
*	Resolve "IndexedView of a vector should allow linear access"	Christopher Moore	2020-05-13
\|
*	Altivec template functions to better code reusability	Pedro Caldeira	2020-05-11
\|
*	Remove packet ops pinsertfirst and pinsertlast that are only used in a ↵	Rasmus Munk Larsen	2020-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	single place, and can be replaced by other ops when constructing the first/final packet in linspaced_op_impl::packetOp. I cannot measure any performance changes for SSE, AVX, or AVX512. name old time/op new time/op delta BM_LinSpace<float>/1 1.63ns ± 0% 1.63ns ± 0% ~ (p=0.762 n=5+5) BM_LinSpace<float>/8 4.92ns ± 3% 4.89ns ± 3% ~ (p=0.421 n=5+5) BM_LinSpace<float>/64 34.6ns ± 0% 34.6ns ± 0% ~ (p=0.841 n=5+5) BM_LinSpace<float>/512 217ns ± 0% 217ns ± 0% ~ (p=0.421 n=5+5) BM_LinSpace<float>/4k 1.68µs ± 0% 1.68µs ± 0% ~ (p=1.000 n=5+5) BM_LinSpace<float>/32k 13.3µs ± 0% 13.3µs ± 0% ~ (p=0.905 n=5+4) BM_LinSpace<float>/256k 107µs ± 0% 107µs ± 0% ~ (p=0.841 n=5+5) BM_LinSpace<float>/1M 427µs ± 0% 427µs ± 0% ~ (p=0.690 n=5+5)
*	Possibility to specify user-defined default cache sizes for GEBP kernel	David Tellenbach	2020-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some architectures have no convinient way to determine cache sizes at runtime. Eigen's GEBP kernel falls back to default cache values in this case which might not be correct in all situations. This patch introduces three preprocessor directives `EIGEN_DEFAULT_L1_CACHE_SIZE` `EIGEN_DEFAULT_L2_CACHE_SIZE` `EIGEN_DEFAULT_L3_CACHE_SIZE` to give users the possibility to set these default values explicitly.
*	Remove unused packet op "palign".	Rasmus Munk Larsen	2020-05-07
\| \| \| \|	Clean up a compiler warning in c++03 mode in AVX512/Complex.h.
*	Remove traits declaring NEON vectorized casts that do not actually have ↵	Rasmus Munk Larsen	2020-05-07
\| \| \| \|	packet op implementations.
*	Fix confusing template param name for Stride fwd decl.	Xiaoxiang Cao	2020-04-30
\|
*	Fix the embarrassingly incomplete fix to the embarrassing bug in blocked ↵	Rasmus Munk Larsen	2020-04-29
\| \| \| \|	transpose.
*	Fix (embarrassing) bug in blocked transpose.	Rasmus Munk Larsen	2020-04-29
\|
*	Add missing transpose in cleanup loop. Without it, we trip an assertion in ↵	Rasmus Munk Larsen	2020-04-29
\| \| \| \|	debug mode.
*	Fix compilation error with Clang on Android: _mm_extract_epi64 fails to compile.	Rasmus Munk Larsen	2020-04-29
\|
*	Extend support for Packet16b:	Rasmus Munk Larsen	2020-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add ptranspose<,4> to support matmul and add unit test for Matrix<bool> Matrix<bool> * work around a bug in slicing of Tensor<bool>. * Add tensor tests This speeds up matmul for boolean matrices by about 10x name old time/op new time/op delta BM_MatMul<bool>/8 267ns ± 0% 479ns ± 0% +79.25% (p=0.008 n=5+5) BM_MatMul<bool>/32 6.42µs ± 0% 0.87µs ± 0% -86.50% (p=0.008 n=5+5) BM_MatMul<bool>/64 43.3µs ± 0% 5.9µs ± 0% -86.42% (p=0.008 n=5+5) BM_MatMul<bool>/128 315µs ± 0% 44µs ± 0% -85.98% (p=0.008 n=5+5) BM_MatMul<bool>/256 2.41ms ± 0% 0.34ms ± 0% -85.68% (p=0.008 n=5+5) BM_MatMul<bool>/512 18.8ms ± 0% 2.7ms ± 0% -85.53% (p=0.008 n=5+5) BM_MatMul<bool>/1k 149ms ± 0% 22ms ± 0% -85.40% (p=0.008 n=5+5)
*	Block transposeInPlace() when the matrix is real and square. This yields a ↵	Rasmus Munk Larsen	2020-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	large speedup because we transpose in registers (or L1 if we spill), instead of one packet at a time, which in the worst case makes the code write to the same cache line PacketSize times instead of once. rmlarsen@rmlarsen4:.../eigen_bench/google3$ benchy --benchmarks=.TransposeInPlace.float.* --reference=srcfs experimental/users/rmlarsen/bench:matmul_bench 10 / 10 [====================================================================================================================================================================================================================] 100.00% 2m50s (Generated by http://go/benchy. Settings: --runs 5 --benchtime 1s --reference "srcfs" --benchmarks ".TransposeInPlace.float.*" experimental/users/rmlarsen/bench:matmul_bench) name old time/op new time/op delta BM_TransposeInPlace<float>/4 9.84ns ± 0% 6.51ns ± 0% -33.80% (p=0.008 n=5+5) BM_TransposeInPlace<float>/8 23.6ns ± 1% 17.6ns ± 0% -25.26% (p=0.016 n=5+4) BM_TransposeInPlace<float>/16 78.8ns ± 0% 60.3ns ± 0% -23.50% (p=0.029 n=4+4) BM_TransposeInPlace<float>/32 302ns ± 0% 229ns ± 0% -24.40% (p=0.008 n=5+5) BM_TransposeInPlace<float>/59 1.03µs ± 0% 0.84µs ± 1% -17.87% (p=0.016 n=5+4) BM_TransposeInPlace<float>/64 1.20µs ± 0% 0.89µs ± 1% -25.81% (p=0.008 n=5+5) BM_TransposeInPlace<float>/128 8.96µs ± 0% 3.82µs ± 2% -57.33% (p=0.008 n=5+5) BM_TransposeInPlace<float>/256 152µs ± 3% 17µs ± 2% -89.06% (p=0.008 n=5+5) BM_TransposeInPlace<float>/512 837µs ± 1% 208µs ± 0% -75.15% (p=0.008 n=5+5) BM_TransposeInPlace<float>/1k 4.28ms ± 2% 1.08ms ± 2% -74.72% (p=0.008 n=5+5)
*	Add support to vector instructions to Packet16uc and Packet16c	Pedro Caldeira	2020-04-27
\|
*	Remove unused packet op "preduxp".	Rasmus Munk Larsen	2020-04-23
\|
*	BooleanRedux.h: Add more EIGEN_DEVICE_FUNC qualifiers.	René Wagner	2020-04-23
\| \| \| \|	This enables operator== on Eigen matrices in device code.
*	Add Packet8s and Packet8us to support signed/unsigned int16/short Altivec ↵	Pedro Caldeira	2020-04-21
\| \| \| \|	vector operations
*	Fix bug in ptrue for Packet16b.	Rasmus Munk Larsen	2020-04-20
\|
*	Add partial vectorization for matrices and tensors of bool. This speeds up ↵	Rasmus Munk Larsen	2020-04-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	boolean operations on Tensors by up to 25x. Benchmark numbers for the logical and of two NxN tensors: name old time/op new time/op delta BM_booleanAnd_1T/3 [using 1 threads] 14.6ns ± 0% 14.4ns ± 0% -0.96% BM_booleanAnd_1T/4 [using 1 threads] 20.5ns ±12% 9.0ns ± 0% -56.07% BM_booleanAnd_1T/7 [using 1 threads] 41.7ns ± 0% 10.5ns ± 0% -74.87% BM_booleanAnd_1T/8 [using 1 threads] 52.1ns ± 0% 10.1ns ± 0% -80.59% BM_booleanAnd_1T/10 [using 1 threads] 76.3ns ± 0% 13.8ns ± 0% -81.87% BM_booleanAnd_1T/15 [using 1 threads] 167ns ± 0% 16ns ± 0% -90.45% BM_booleanAnd_1T/16 [using 1 threads] 188ns ± 0% 16ns ± 0% -91.57% BM_booleanAnd_1T/31 [using 1 threads] 667ns ± 0% 34ns ± 0% -94.83% BM_booleanAnd_1T/32 [using 1 threads] 710ns ± 0% 35ns ± 0% -95.01% BM_booleanAnd_1T/64 [using 1 threads] 2.80µs ± 0% 0.11µs ± 0% -95.93% BM_booleanAnd_1T/128 [using 1 threads] 11.2µs ± 0% 0.4µs ± 0% -96.11% BM_booleanAnd_1T/256 [using 1 threads] 44.6µs ± 0% 2.5µs ± 0% -94.31% BM_booleanAnd_1T/512 [using 1 threads] 178µs ± 0% 10µs ± 0% -94.35% BM_booleanAnd_1T/1k [using 1 threads] 717µs ± 0% 78µs ± 1% -89.07% BM_booleanAnd_1T/2k [using 1 threads] 2.87ms ± 0% 0.31ms ± 1% -89.08% BM_booleanAnd_1T/4k [using 1 threads] 11.7ms ± 0% 1.9ms ± 4% -83.55% BM_booleanAnd_1T/10k [using 1 threads] 70.3ms ± 0% 17.2ms ± 4% -75.48%
*	Move eigen_packet_wrapper to GenericPacketMath.h and use it for ↵	Rasmus Munk Larsen	2020-04-15
\| \| \| \| \| \| \|	SSE/AVX/AVX512 as it is already used for NEON. This will allow us to define multiple packet types backed by the same vector type, e.g., __m128i. Use this machanism to define packets for half and clean up the packet op implementations.
*	Fix typo in TypeCasting.h	Rasmus Munk Larsen	2020-04-14
\|