eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Remove traits declaring NEON vectorized casts that do not actually have ↵	Rasmus Munk Larsen	2020-05-07
\| \| \| \|	packet op implementations.
*	Fix confusing template param name for Stride fwd decl.	Xiaoxiang Cao	2020-04-30
\|
*	Fix the embarrassingly incomplete fix to the embarrassing bug in blocked ↵	Rasmus Munk Larsen	2020-04-29
\| \| \| \|	transpose.
*	Fix (embarrassing) bug in blocked transpose.	Rasmus Munk Larsen	2020-04-29
\|
*	Add missing transpose in cleanup loop. Without it, we trip an assertion in ↵	Rasmus Munk Larsen	2020-04-29
\| \| \| \|	debug mode.
*	Fix compilation error with Clang on Android: _mm_extract_epi64 fails to compile.	Rasmus Munk Larsen	2020-04-29
\|
*	Extend support for Packet16b:	Rasmus Munk Larsen	2020-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add ptranspose<,4> to support matmul and add unit test for Matrix<bool> Matrix<bool> * work around a bug in slicing of Tensor<bool>. * Add tensor tests This speeds up matmul for boolean matrices by about 10x name old time/op new time/op delta BM_MatMul<bool>/8 267ns ± 0% 479ns ± 0% +79.25% (p=0.008 n=5+5) BM_MatMul<bool>/32 6.42µs ± 0% 0.87µs ± 0% -86.50% (p=0.008 n=5+5) BM_MatMul<bool>/64 43.3µs ± 0% 5.9µs ± 0% -86.42% (p=0.008 n=5+5) BM_MatMul<bool>/128 315µs ± 0% 44µs ± 0% -85.98% (p=0.008 n=5+5) BM_MatMul<bool>/256 2.41ms ± 0% 0.34ms ± 0% -85.68% (p=0.008 n=5+5) BM_MatMul<bool>/512 18.8ms ± 0% 2.7ms ± 0% -85.53% (p=0.008 n=5+5) BM_MatMul<bool>/1k 149ms ± 0% 22ms ± 0% -85.40% (p=0.008 n=5+5)
*	Block transposeInPlace() when the matrix is real and square. This yields a ↵	Rasmus Munk Larsen	2020-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	large speedup because we transpose in registers (or L1 if we spill), instead of one packet at a time, which in the worst case makes the code write to the same cache line PacketSize times instead of once. rmlarsen@rmlarsen4:.../eigen_bench/google3$ benchy --benchmarks=.TransposeInPlace.float.* --reference=srcfs experimental/users/rmlarsen/bench:matmul_bench 10 / 10 [====================================================================================================================================================================================================================] 100.00% 2m50s (Generated by http://go/benchy. Settings: --runs 5 --benchtime 1s --reference "srcfs" --benchmarks ".TransposeInPlace.float.*" experimental/users/rmlarsen/bench:matmul_bench) name old time/op new time/op delta BM_TransposeInPlace<float>/4 9.84ns ± 0% 6.51ns ± 0% -33.80% (p=0.008 n=5+5) BM_TransposeInPlace<float>/8 23.6ns ± 1% 17.6ns ± 0% -25.26% (p=0.016 n=5+4) BM_TransposeInPlace<float>/16 78.8ns ± 0% 60.3ns ± 0% -23.50% (p=0.029 n=4+4) BM_TransposeInPlace<float>/32 302ns ± 0% 229ns ± 0% -24.40% (p=0.008 n=5+5) BM_TransposeInPlace<float>/59 1.03µs ± 0% 0.84µs ± 1% -17.87% (p=0.016 n=5+4) BM_TransposeInPlace<float>/64 1.20µs ± 0% 0.89µs ± 1% -25.81% (p=0.008 n=5+5) BM_TransposeInPlace<float>/128 8.96µs ± 0% 3.82µs ± 2% -57.33% (p=0.008 n=5+5) BM_TransposeInPlace<float>/256 152µs ± 3% 17µs ± 2% -89.06% (p=0.008 n=5+5) BM_TransposeInPlace<float>/512 837µs ± 1% 208µs ± 0% -75.15% (p=0.008 n=5+5) BM_TransposeInPlace<float>/1k 4.28ms ± 2% 1.08ms ± 2% -74.72% (p=0.008 n=5+5)
*	Add support to vector instructions to Packet16uc and Packet16c	Pedro Caldeira	2020-04-27
\|
*	Remove unused packet op "preduxp".	Rasmus Munk Larsen	2020-04-23
\|
*	BooleanRedux.h: Add more EIGEN_DEVICE_FUNC qualifiers.	René Wagner	2020-04-23
\| \| \| \|	This enables operator== on Eigen matrices in device code.
*	Add Packet8s and Packet8us to support signed/unsigned int16/short Altivec ↵	Pedro Caldeira	2020-04-21
\| \| \| \|	vector operations
*	Fix bug in ptrue for Packet16b.	Rasmus Munk Larsen	2020-04-20
\|
*	Add partial vectorization for matrices and tensors of bool. This speeds up ↵	Rasmus Munk Larsen	2020-04-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	boolean operations on Tensors by up to 25x. Benchmark numbers for the logical and of two NxN tensors: name old time/op new time/op delta BM_booleanAnd_1T/3 [using 1 threads] 14.6ns ± 0% 14.4ns ± 0% -0.96% BM_booleanAnd_1T/4 [using 1 threads] 20.5ns ±12% 9.0ns ± 0% -56.07% BM_booleanAnd_1T/7 [using 1 threads] 41.7ns ± 0% 10.5ns ± 0% -74.87% BM_booleanAnd_1T/8 [using 1 threads] 52.1ns ± 0% 10.1ns ± 0% -80.59% BM_booleanAnd_1T/10 [using 1 threads] 76.3ns ± 0% 13.8ns ± 0% -81.87% BM_booleanAnd_1T/15 [using 1 threads] 167ns ± 0% 16ns ± 0% -90.45% BM_booleanAnd_1T/16 [using 1 threads] 188ns ± 0% 16ns ± 0% -91.57% BM_booleanAnd_1T/31 [using 1 threads] 667ns ± 0% 34ns ± 0% -94.83% BM_booleanAnd_1T/32 [using 1 threads] 710ns ± 0% 35ns ± 0% -95.01% BM_booleanAnd_1T/64 [using 1 threads] 2.80µs ± 0% 0.11µs ± 0% -95.93% BM_booleanAnd_1T/128 [using 1 threads] 11.2µs ± 0% 0.4µs ± 0% -96.11% BM_booleanAnd_1T/256 [using 1 threads] 44.6µs ± 0% 2.5µs ± 0% -94.31% BM_booleanAnd_1T/512 [using 1 threads] 178µs ± 0% 10µs ± 0% -94.35% BM_booleanAnd_1T/1k [using 1 threads] 717µs ± 0% 78µs ± 1% -89.07% BM_booleanAnd_1T/2k [using 1 threads] 2.87ms ± 0% 0.31ms ± 1% -89.08% BM_booleanAnd_1T/4k [using 1 threads] 11.7ms ± 0% 1.9ms ± 4% -83.55% BM_booleanAnd_1T/10k [using 1 threads] 70.3ms ± 0% 17.2ms ± 4% -75.48%
*	Move eigen_packet_wrapper to GenericPacketMath.h and use it for ↵	Rasmus Munk Larsen	2020-04-15
\| \| \| \| \| \| \|	SSE/AVX/AVX512 as it is already used for NEON. This will allow us to define multiple packet types backed by the same vector type, e.g., __m128i. Use this machanism to define packets for half and clean up the packet op implementations.
*	Fix typo in TypeCasting.h	Rasmus Munk Larsen	2020-04-14
\|
*	Fix big in vectorized casting of	Rasmus Munk Larsen	2020-04-14
\| \| \| \| \| \|	{uint8, int8} -> {int16, uint16, int32, uint32, float} {uint16, int16} -> {int32, uint32, int64, uint64, float} for NEON. These conversions were advertised as vectorized, but not actually implemented.
*	CommaInitializer wrongfully asserted for 0-sized blocks	Christoph Hertzberg	2020-04-13
\| \| \| \|	commainitialier unit-test never actually called `test_block_recursion`, which also was not correctly implemented and would have caused too deep template recursion.
*	Fixed commainitializer test.	Antonio Sanchez	2020-04-10
\| \| \| \| \| \|	The removed `finished()` call was responsible for enforcing that the initializer was provided the correct number of values. Putting it back in to restore previous behavior.
*	Speed up matrix multiplication for small to medium size matrices by using ↵	Rasmus Munk Larsen	2020-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	half- or quarter-packet vectorized loads in gemm_pack_rhs if they have size 4, instead of dropping down the the scalar path. Benchmark measurements below are for computing ```c.noalias() = a.transpose() * b;``` for square RowMajor matrices of varying size. Measured improvement with AVX+FMA: name old time/op new time/op delta BM_MatMul_ATB/8 139ns ± 1% 129ns ± 1% -7.49% (p=0.008 n=5+5) BM_MatMul_ATB/32 1.46µs ± 1% 1.22µs ± 0% -16.72% (p=0.008 n=5+5) BM_MatMul_ATB/64 8.43µs ± 1% 7.41µs ± 0% -12.04% (p=0.008 n=5+5) BM_MatMul_ATB/128 56.8µs ± 1% 52.9µs ± 1% -6.83% (p=0.008 n=5+5) BM_MatMul_ATB/256 407µs ± 1% 395µs ± 3% -2.94% (p=0.032 n=5+5) BM_MatMul_ATB/512 3.27ms ± 3% 3.18ms ± 1% ~ (p=0.056 n=5+5) Measured improvement for AVX512: name old time/op new time/op delta BM_MatMul_ATB/8 167ns ± 1% 154ns ± 1% -7.63% (p=0.008 n=5+5) BM_MatMul_ATB/32 1.08µs ± 1% 0.83µs ± 3% -23.58% (p=0.008 n=5+5) BM_MatMul_ATB/64 6.21µs ± 1% 5.06µs ± 1% -18.47% (p=0.008 n=5+5) BM_MatMul_ATB/128 36.1µs ± 2% 31.3µs ± 1% -13.32% (p=0.008 n=5+5) BM_MatMul_ATB/256 263µs ± 2% 242µs ± 2% -7.92% (p=0.008 n=5+5) BM_MatMul_ATB/512 1.95ms ± 2% 1.91ms ± 2% ~ (p=0.095 n=5+5) BM_MatMul_ATB/1k 15.4ms ± 4% 14.8ms ± 2% ~ (p=0.095 n=5+5)
*	Missing struct definition in NumTraits	Antonio Sanchez	2020-04-07
\|
*	Add numeric_limits min and max for bool	Akshay Naresh Modi	2020-04-06
\| \| \| \|	This will allow (among other things) computation of argmax and argmin of bool tensors
*	Bugfix: conjugate_gradient did not compile with lazy-evaluated RealScalar	Bernardo Bahia Monteiro	2020-03-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The error generated by the compiler was: no matching function for call to 'maxi' RealScalar threshold = numext::maxi(toltolrhsNorm2,considerAsZero); The important part in the following notes was: candidate template ignored: deduced conflicting types for parameter 'T'" ('codi::Multiply11<...>' vs. 'codi::ActiveReal<...>') EIGEN_ALWAYS_INLINE T maxi(const T& x, const T& y) I am using CoDiPack to provide the RealScalar type. This bug was introduced in bc000deaa Fix conjugate-gradient for very small rhs
*	Fix bug in ↵	Rasmus Munk Larsen	2020-03-27
\| \| \| \|	https://gitlab.com/libeigen/eigen/-/commit/52d54278beefee8b2f19dcca4fd900916154e174
*	NEON: Fixed MSVC types definitions	Joel Holdsworth	2020-03-26
\|
*	Additional NEON packet-math operations	Joel Holdsworth	2020-03-26
\|
*	Adhere to recommended load/store intrinsics for pp64le	Everton Constantino	2020-03-23
\|
*	Fixing float32's pround halfway criteria to match STL's criteria.	Everton Constantino	2020-03-21
\|
*	Fixed:	Alessio M	2020-03-21
\| \| \| \| \|	- access violation when initializing 0x0 matrices - exception can be thrown during stack unwind while comma-initializing a matrix if eigen_assert if configured to throw
*	Update VectorwiseOp.h to allow Plugins similar to MatrixBase.h or ArrayBase.h	dlazenby	2020-03-20
\|
*	Bug https://gitlab.com/libeigen/eigen/-/issues/1415: add missing ↵	Masaki Murooka	2020-03-20
\| \| \| \|	EIGEN_DEVICE_FUNC to diagonal_product_evaluator_base.
*	Remove reference to non-existent unary_op_base class.	Rasmus Munk Larsen	2020-03-19
\|
*	Add missing arguments to numext::absdiff().	Rasmus Munk Larsen	2020-03-19
\|
*	Add absolute_difference coefficient-wise binary Array function	Joel Holdsworth	2020-03-19
\|
*	Reenabling packetmath unsigned tests, adding dummy pabs for relevant unsigned	Everton Constantino	2020-03-19
\| \| \| \|	types.
*	Add shift_left<N> and shift_right<N> coefficient-wise unary Array functions	Joel Holdsworth	2020-03-19
\|
*	Implement integer square-root for NEON	Joel Holdsworth	2020-03-19
\|
*	Update NullaryFunctors.h	Allan Leal	2020-03-16
\|
*	Fixing HIP breakage caused by the recent commit that introduces Packet4h2 as ↵	Deven Desai	2020-03-12
\| \| \| \|	the Eigen::Half packet type
*	NEON: Added int64_t and uint64_t packet math	Joel Holdsworth	2020-03-10
\|
*	NEON: Added int8_t and uint8_t packet math	Joel Holdsworth	2020-03-10
\|
*	NEON: Added int16_t and uint16_t packet math	Joel Holdsworth	2020-03-10
\|
*	NEON: Added uint32_t packet math	Joel Holdsworth	2020-03-10
\|
*	NEON: Implemented half-size vectors	Joel Holdsworth	2020-03-10
\|
*	NEON: Set packet_traits<double> flags	Joel Holdsworth	2020-03-10
\|
*	remove duplicate pset1 for half and add some comments about why we need ↵	Sami Kama	2020-03-10
\| \| \| \|	expose pmul/add/div/min/max on host
*	Revert "avoid selecting half-packets when unnecessary"	Rasmus Munk Larsen	2020-02-25
\| \| \|	This reverts commit 5ca10480b0756e40b0723d90adeba8506291fc7c
*	Revert "Pick full packet unconditionally when EIGEN_UNALIGNED_VECTORIZE"	Rasmus Munk Larsen	2020-02-25
\| \| \|	This reverts commit 44df2109c8c700222643a9a45f144676348f4df1
*	Revert "do not pick full-packet if it'd result in more operations"	Rasmus Munk Larsen	2020-02-25
\| \| \|	This reverts commit e9cc0cd353803a818204e48054bd89699b84e6c6
*	Include <sstream> explicitly, and don't rely on the implicit include via ↵	Tobias Bosch	2020-02-24
\| \| \| \| \|	<complex>. This implicit dependency does no longer exist in a recent llbm release (sha 78be61871704).