eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Support BFloat16 in Eigen	Teng Lu	2020-06-20
\|
*	Fix #1911: add benchmark for move semantics with fixed-size matrix	Sebastien Boisvert	2020-06-11
\| \| \| \| \| \| \| \| \| \| \|	$ clang++ -O3 bench/bench_move_semantics.cpp -I. -std=c++11 \ -o bench_move_semantics $ ./bench_move_semantics float copy semantics: 1755.97 ms float move semantics: 55.063 ms double copy semantics: 2457.65 ms double move semantics: 55.034 ms
*	Remove HasCast and fix packetmath cast tests.	Antonio Sanchez	2020-06-11
\| \| \| \| \| \| \| \| \| \| \|	The use of the `packet_traits<>::HasCast` field is currently inconsistent with `type_casting_traits<>`, and is unused apart from within `test/packetmath.cpp`. In addition, those packetmath cast tests do not currently reflect how casts are performed in practice: they ignore the `SrcCoeffRatio` and `TgtCoeffRatio` fields, assuming a 1:1 ratio. Here we remove the unsed `HasCast`, and modify the packet cast tests to better reflect their usage.
*	Fix #1757: remove the word 'suicide'	Sebastien Boisvert	2020-06-11
\|
*	Fix broken packetmath test for logistic on Arm.	Rasmus Munk Larsen	2020-06-04
\|
*	Bug #1777: make the scalar and packet path consistent for the logistic ↵	Gael Guennebaud	2020-05-31
\| \| \| \|	function + respective unit test
*	Save one extra temporary when assigning a sparse product to a row-major ↵	Gael Guennebaud	2020-05-30
\| \| \| \|	sparse matrix
*	Guard usage of decltype since it's a C++11 feature	David Tellenbach	2020-05-20
\| \| \| \|	This fixes https://gitlab.com/libeigen/eigen/-/issues/1897
*	Add guard around specialization for bool, which is only currently ↵	Rasmus Munk Larsen	2020-05-19
\| \| \| \|	implemented for SSE.
*	- Vectorizing MMA packing.	Everton Constantino	2020-05-19
\| \| \| \| \|	- Optimizing MMA kernel. - Adding PacketBlock store to blas_data_mapper.
*	Add missing packet ops for bool, and make it pass the same packet op unit ↵	Rasmus Munk Larsen	2020-05-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tests as other arithmetic types. This change also contains a few minor cleanups: 1. Remove packet op pnot, which is not needed for anything other than pcmp_le_or_nan, which can be done in other ways. 2. Remove the "HasInsert" enum, which is no longer needed since we removed the corresponding packet ops. 3. Add faster pselect op for Packet4i when SSE4.1 is supported. Among other things, this makes the fast transposeInPlace() method available for Matrix<bool>. Run on ************** (72 X 2994 MHz CPUs); 2020-05-09T10:51:02.372347913-07:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------------- BM_TransposeInPlace<float>/4 9.77 9.77 71670320 BM_TransposeInPlace<float>/8 21.9 21.9 31929525 BM_TransposeInPlace<float>/16 66.6 66.6 10000000 BM_TransposeInPlace<float>/32 243 243 2879561 BM_TransposeInPlace<float>/59 844 844 829767 BM_TransposeInPlace<float>/64 933 933 750567 BM_TransposeInPlace<float>/128 3944 3945 177405 BM_TransposeInPlace<float>/256 16853 16853 41457 BM_TransposeInPlace<float>/512 204952 204968 3448 BM_TransposeInPlace<float>/1k 1053889 1053861 664 BM_TransposeInPlace<bool>/4 14.4 14.4 48637301 BM_TransposeInPlace<bool>/8 36.0 36.0 19370222 BM_TransposeInPlace<bool>/16 31.5 31.5 22178902 BM_TransposeInPlace<bool>/32 111 111 6272048 BM_TransposeInPlace<bool>/59 626 626 1000000 BM_TransposeInPlace<bool>/64 428 428 1632689 BM_TransposeInPlace<bool>/128 1677 1677 417377 BM_TransposeInPlace<bool>/256 7126 7126 96264 BM_TransposeInPlace<bool>/512 29021 29024 24165 BM_TransposeInPlace<bool>/1k 116321 116330 6068
*	Added support for reverse iterators for Vectorwise operations.	Felipe Attanasio	2020-05-14
\|
*	Indexed view should have RowMajorBit when there is staticly a single row	Christopher Moore	2020-05-14
\|
*	Resolve "IndexedView of a vector should allow linear access"	Christopher Moore	2020-05-13
\|
*	Remove packet ops pinsertfirst and pinsertlast that are only used in a ↵	Rasmus Munk Larsen	2020-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	single place, and can be replaced by other ops when constructing the first/final packet in linspaced_op_impl::packetOp. I cannot measure any performance changes for SSE, AVX, or AVX512. name old time/op new time/op delta BM_LinSpace<float>/1 1.63ns ± 0% 1.63ns ± 0% ~ (p=0.762 n=5+5) BM_LinSpace<float>/8 4.92ns ± 3% 4.89ns ± 3% ~ (p=0.421 n=5+5) BM_LinSpace<float>/64 34.6ns ± 0% 34.6ns ± 0% ~ (p=0.841 n=5+5) BM_LinSpace<float>/512 217ns ± 0% 217ns ± 0% ~ (p=0.421 n=5+5) BM_LinSpace<float>/4k 1.68µs ± 0% 1.68µs ± 0% ~ (p=1.000 n=5+5) BM_LinSpace<float>/32k 13.3µs ± 0% 13.3µs ± 0% ~ (p=0.905 n=5+4) BM_LinSpace<float>/256k 107µs ± 0% 107µs ± 0% ~ (p=0.841 n=5+5) BM_LinSpace<float>/1M 427µs ± 0% 427µs ± 0% ~ (p=0.690 n=5+5)
*	Remove unused packet op "palign".	Rasmus Munk Larsen	2020-05-07
\| \| \| \|	Clean up a compiler warning in c++03 mode in AVX512/Complex.h.
*	Make size odd for transposeInPlace test to make sure we hit the scalar path.	Rasmus Munk Larsen	2020-05-07
\|
*	Extend support for Packet16b:	Rasmus Munk Larsen	2020-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add ptranspose<,4> to support matmul and add unit test for Matrix<bool> Matrix<bool> * work around a bug in slicing of Tensor<bool>. * Add tensor tests This speeds up matmul for boolean matrices by about 10x name old time/op new time/op delta BM_MatMul<bool>/8 267ns ± 0% 479ns ± 0% +79.25% (p=0.008 n=5+5) BM_MatMul<bool>/32 6.42µs ± 0% 0.87µs ± 0% -86.50% (p=0.008 n=5+5) BM_MatMul<bool>/64 43.3µs ± 0% 5.9µs ± 0% -86.42% (p=0.008 n=5+5) BM_MatMul<bool>/128 315µs ± 0% 44µs ± 0% -85.98% (p=0.008 n=5+5) BM_MatMul<bool>/256 2.41ms ± 0% 0.34ms ± 0% -85.68% (p=0.008 n=5+5) BM_MatMul<bool>/512 18.8ms ± 0% 2.7ms ± 0% -85.53% (p=0.008 n=5+5) BM_MatMul<bool>/1k 149ms ± 0% 22ms ± 0% -85.40% (p=0.008 n=5+5)
*	Block transposeInPlace() when the matrix is real and square. This yields a ↵	Rasmus Munk Larsen	2020-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	large speedup because we transpose in registers (or L1 if we spill), instead of one packet at a time, which in the worst case makes the code write to the same cache line PacketSize times instead of once. rmlarsen@rmlarsen4:.../eigen_bench/google3$ benchy --benchmarks=.TransposeInPlace.float.* --reference=srcfs experimental/users/rmlarsen/bench:matmul_bench 10 / 10 [====================================================================================================================================================================================================================] 100.00% 2m50s (Generated by http://go/benchy. Settings: --runs 5 --benchtime 1s --reference "srcfs" --benchmarks ".TransposeInPlace.float.*" experimental/users/rmlarsen/bench:matmul_bench) name old time/op new time/op delta BM_TransposeInPlace<float>/4 9.84ns ± 0% 6.51ns ± 0% -33.80% (p=0.008 n=5+5) BM_TransposeInPlace<float>/8 23.6ns ± 1% 17.6ns ± 0% -25.26% (p=0.016 n=5+4) BM_TransposeInPlace<float>/16 78.8ns ± 0% 60.3ns ± 0% -23.50% (p=0.029 n=4+4) BM_TransposeInPlace<float>/32 302ns ± 0% 229ns ± 0% -24.40% (p=0.008 n=5+5) BM_TransposeInPlace<float>/59 1.03µs ± 0% 0.84µs ± 1% -17.87% (p=0.016 n=5+4) BM_TransposeInPlace<float>/64 1.20µs ± 0% 0.89µs ± 1% -25.81% (p=0.008 n=5+5) BM_TransposeInPlace<float>/128 8.96µs ± 0% 3.82µs ± 2% -57.33% (p=0.008 n=5+5) BM_TransposeInPlace<float>/256 152µs ± 3% 17µs ± 2% -89.06% (p=0.008 n=5+5) BM_TransposeInPlace<float>/512 837µs ± 1% 208µs ± 0% -75.15% (p=0.008 n=5+5) BM_TransposeInPlace<float>/1k 4.28ms ± 2% 1.08ms ± 2% -74.72% (p=0.008 n=5+5)
*	Remove unused packet op "preduxp".	Rasmus Munk Larsen	2020-04-23
\|
*	Add partial vectorization for matrices and tensors of bool. This speeds up ↵	Rasmus Munk Larsen	2020-04-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	boolean operations on Tensors by up to 25x. Benchmark numbers for the logical and of two NxN tensors: name old time/op new time/op delta BM_booleanAnd_1T/3 [using 1 threads] 14.6ns ± 0% 14.4ns ± 0% -0.96% BM_booleanAnd_1T/4 [using 1 threads] 20.5ns ±12% 9.0ns ± 0% -56.07% BM_booleanAnd_1T/7 [using 1 threads] 41.7ns ± 0% 10.5ns ± 0% -74.87% BM_booleanAnd_1T/8 [using 1 threads] 52.1ns ± 0% 10.1ns ± 0% -80.59% BM_booleanAnd_1T/10 [using 1 threads] 76.3ns ± 0% 13.8ns ± 0% -81.87% BM_booleanAnd_1T/15 [using 1 threads] 167ns ± 0% 16ns ± 0% -90.45% BM_booleanAnd_1T/16 [using 1 threads] 188ns ± 0% 16ns ± 0% -91.57% BM_booleanAnd_1T/31 [using 1 threads] 667ns ± 0% 34ns ± 0% -94.83% BM_booleanAnd_1T/32 [using 1 threads] 710ns ± 0% 35ns ± 0% -95.01% BM_booleanAnd_1T/64 [using 1 threads] 2.80µs ± 0% 0.11µs ± 0% -95.93% BM_booleanAnd_1T/128 [using 1 threads] 11.2µs ± 0% 0.4µs ± 0% -96.11% BM_booleanAnd_1T/256 [using 1 threads] 44.6µs ± 0% 2.5µs ± 0% -94.31% BM_booleanAnd_1T/512 [using 1 threads] 178µs ± 0% 10µs ± 0% -94.35% BM_booleanAnd_1T/1k [using 1 threads] 717µs ± 0% 78µs ± 1% -89.07% BM_booleanAnd_1T/2k [using 1 threads] 2.87ms ± 0% 0.31ms ± 1% -89.08% BM_booleanAnd_1T/4k [using 1 threads] 11.7ms ± 0% 1.9ms ± 4% -83.55% BM_booleanAnd_1T/10k [using 1 threads] 70.3ms ± 0% 17.2ms ± 4% -75.48%
*	CommaInitializer wrongfully asserted for 0-sized blocks	Christoph Hertzberg	2020-04-13
\| \| \| \|	commainitialier unit-test never actually called `test_block_recursion`, which also was not correctly implemented and would have caused too deep template recursion.
*	Replace norm() with squaredNorm() to address integer overflows	Antonio Sanchez	2020-04-07
\| \| \| \| \| \| \| \| \| \| \|	For random matrices with integer coefficients, many of the tests here lead to integer overflows. When taking the norm() of a row/column, the squaredNorm() often overflows to a negative value, leading to domain errors when taking the sqrt(). This leads to a crash on some systems. By replacing the norm() call by a squaredNorm(), the values still overflow, but at least there is no domain error. Addresses https://gitlab.com/libeigen/eigen/-/issues/1856
*	Fix packetmath test build for AVX.	Rasmus Munk Larsen	2020-03-27
\|
*	Fix bug in ↵	Rasmus Munk Larsen	2020-03-27
\| \| \| \|	https://gitlab.com/libeigen/eigen/-/commit/52d54278beefee8b2f19dcca4fd900916154e174
*	Additional NEON packet-math operations	Joel Holdsworth	2020-03-26
\|
*	Make file formatting comply with POSIX and Unix standards	Aaron Franke	2020-03-23
\| \| \| \|	UTF-8, LF, no BOM, and newlines at the end of files
*	Add absolute_difference coefficient-wise binary Array function	Joel Holdsworth	2020-03-19
\|
*	Implement integer square-root for NEON	Joel Holdsworth	2020-03-19
\|
*	test/packetmath: Add tests for all integer types	Joel Holdsworth	2020-03-10
\|
*	test/packetmath: Made negate non-mandatory	Joel Holdsworth	2020-03-10
\|
*	Revert "add some static checks for packet-picking logic"	Rasmus Munk Larsen	2020-02-25
\| \| \|	This reverts commit 776960024585b907acc4abc3c59aef605941bb75
*	Revert "Disable test in test/vectorization_logic.cpp, which is currently ↵	Rasmus Munk Larsen	2020-02-25
\| \| \| \| \|	failing with AVX." This reverts commit b625adffd877639ff5cbe51ea154e1905a3b405c
*	Disable test in test/vectorization_logic.cpp, which is currently failing ↵	Rasmus Munk Larsen	2020-02-24
\| \| \| \|	with AVX.
*	add some static checks for packet-picking logic	Francesco Mazzoli	2020-02-07
\|
*	Removing executable bit from file mode	Christoph Hertzberg	2020-01-11
\|
*	Bug #1790: Make `areApprox` check `numext::isnan` instead of bitwise ↵	Christoph Hertzberg	2020-01-11
\| \| \| \|	equality (NaNs don't have to be bitwise equal).
*	Added special_packetmath test and tweaked bounds on tests.	Srinivas Vasudevan	2020-01-11
\| \| \| \| \|	Refactor shared packetmath code to header file. (Squashed from PR !38)
*	Use data.data() instead of &data (since it is not obvious that Array is ↵	Christoph Hertzberg	2020-01-09
\| \| \| \|	trivially copyable)
*	Bug #1785: Introduce numext::rint.	Ilya Tokar	2020-01-07
\| \| \| \| \| \|	This provides a new op that matches std::rint and previous behavior of pround. Also adds corresponding unsupported/../Tensor op. Performance is the same as e. g. floor (tested SSE/AVX).
*	Protecting integer_types's long long test with a check to see if we have ↵	Everton Constantino	2020-01-07
\| \| \| \|	CXX11 support.
*	Bug #1788: Fix rule-of-three violations inside the stable modules.	Christoph Hertzberg	2019-12-19
\| \| \| \| \|	This fixes deprecated-copy warnings when compiling with GCC>=9 Also protect some additional Base-constructors from getting called by user code code (#1587)
*	Fix unit-test which I broke in previous fix	Christoph Hertzberg	2019-12-19
\|
*	Fix some maybe-unitialized warnings	Christoph Hertzberg	2019-12-18
\|
*	Workaround class-memaccess warnings on newer GCC versions	Christoph Hertzberg	2019-12-18
\|
*	Improve accuracy of fast approximate tanh and the logistic functions in ↵	Rasmus Munk Larsen	2019-12-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Eigen, such that they preserve relative accuracy to within a few ULPs where their function values tend to zero (around x=0 for tanh, and for large negative x for the logistic function). This change re-instates the fast rational approximation of the logistic function for float32 in Eigen (removed in https://gitlab.com/libeigen/eigen/commit/66f07efeaed39d6a67005343d7e0caf7d9eeacdb), but uses the more accurate approximation 1/(1+exp(-1)) ~= exp(x) below -9. The exponential is only calculated on the vectorized path if at least one element in the SIMD input vector is less than -9. This change also contains a few improvements to speed up the original float specialization of logistic: - Introduce EIGEN_PREDICT_{FALSE,TRUE} for __builtin_predict and use it to predict that the logistic-only path is most likely (~2-3% speedup for the common case). - Carefully set the upper clipping point to the smallest x where the approximation evaluates to exactly 1. This saves the explicit clamping of the output (~7% speedup). The increased accuracy for tanh comes at a cost of 10-20% depending on instruction set. The benchmarks below repeated calls u = v.logistic() (u = v.tanh(), respectively) where u and v are of type Eigen::ArrayXf, have length 8k, and v contains random numbers in [-1,1]. Benchmark numbers for logistic: Before: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_logistic_float 4467 4468 155835 model_time: 4827 AVX BM_eigen_logistic_float 2347 2347 299135 model_time: 2926 AVX+FMA BM_eigen_logistic_float 1467 1467 476143 model_time: 2926 AVX512 BM_eigen_logistic_float 805 805 858696 model_time: 1463 After: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_logistic_float 2589 2590 270264 model_time: 4827 AVX BM_eigen_logistic_float 1428 1428 489265 model_time: 2926 AVX+FMA BM_eigen_logistic_float 1059 1059 662255 model_time: 2926 AVX512 BM_eigen_logistic_float 673 673 1000000 model_time: 1463 Benchmark numbers for tanh: Before: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_tanh_float 2391 2391 292624 model_time: 4242 AVX BM_eigen_tanh_float 1256 1256 554662 model_time: 2633 AVX+FMA BM_eigen_tanh_float 823 823 866267 model_time: 1609 AVX512 BM_eigen_tanh_float 443 443 1578999 model_time: 805 After: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_tanh_float 2588 2588 273531 model_time: 4242 AVX BM_eigen_tanh_float 1536 1536 452321 model_time: 2633 AVX+FMA BM_eigen_tanh_float 1007 1007 694681 model_time: 1609 AVX512 BM_eigen_tanh_float 471 471 1472178 model_time: 805
*	Bug 1785: fix pround on x86 to use the same rounding mode as std::round.	Ilya Tokar	2019-12-12
\| \| \| \| \| \|	This also adds pset1frombits helper to Packet[24]d. Makes round ~45% slower for SSE: 1.65µs ± 1% before vs 2.45µs ± 2% after, stil an order of magnitude faster than scalar version: 33.8µs ± 2%.
*	Fix implementation of complex expm1. Add tests that fail with previous ↵	Srinivas Vasudevan	2019-12-12
\| \| \| \|	implementation, but pass with the current one.
*	Added io test	Joel Holdsworth	2019-12-11
\|
*	Fix QuaternionBase::cast for quaternion map and wrapper.	Gael Guennebaud	2019-12-03
\|