eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
...
*	Altivec template functions to better code reusability	Pedro Caldeira	2020-05-11
\|
*	Eigen moved the `scanLauncehr` function inside the internal namespace.	mehdi-goli	2020-05-11
\| \| \| \| \| \| \|	This commit applies the following changes: - Moving the `scamLauncher` specialization inside internal namespace to fix compiler crash on TensorScan for SYCL backend. - Replacing `SYCL/sycl.hpp` to `CL/sycl.hpp` in order to follow SYCL 1.2.1 standard. - minor fixes: commenting out an unused variable to avoid compiler warnings.
*	Remove packet ops pinsertfirst and pinsertlast that are only used in a ↵	Rasmus Munk Larsen	2020-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	single place, and can be replaced by other ops when constructing the first/final packet in linspaced_op_impl::packetOp. I cannot measure any performance changes for SSE, AVX, or AVX512. name old time/op new time/op delta BM_LinSpace<float>/1 1.63ns ± 0% 1.63ns ± 0% ~ (p=0.762 n=5+5) BM_LinSpace<float>/8 4.92ns ± 3% 4.89ns ± 3% ~ (p=0.421 n=5+5) BM_LinSpace<float>/64 34.6ns ± 0% 34.6ns ± 0% ~ (p=0.841 n=5+5) BM_LinSpace<float>/512 217ns ± 0% 217ns ± 0% ~ (p=0.421 n=5+5) BM_LinSpace<float>/4k 1.68µs ± 0% 1.68µs ± 0% ~ (p=1.000 n=5+5) BM_LinSpace<float>/32k 13.3µs ± 0% 13.3µs ± 0% ~ (p=0.905 n=5+4) BM_LinSpace<float>/256k 107µs ± 0% 107µs ± 0% ~ (p=0.841 n=5+5) BM_LinSpace<float>/1M 427µs ± 0% 427µs ± 0% ~ (p=0.690 n=5+5)
*	Possibility to specify user-defined default cache sizes for GEBP kernel	David Tellenbach	2020-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some architectures have no convinient way to determine cache sizes at runtime. Eigen's GEBP kernel falls back to default cache values in this case which might not be correct in all situations. This patch introduces three preprocessor directives `EIGEN_DEFAULT_L1_CACHE_SIZE` `EIGEN_DEFAULT_L2_CACHE_SIZE` `EIGEN_DEFAULT_L3_CACHE_SIZE` to give users the possibility to set these default values explicitly.
*	Remove unused packet op "palign".	Rasmus Munk Larsen	2020-05-07
\| \| \| \|	Clean up a compiler warning in c++03 mode in AVX512/Complex.h.
*	Remove traits declaring NEON vectorized casts that do not actually have ↵	Rasmus Munk Larsen	2020-05-07
\| \| \| \|	packet op implementations.
*	Fix confusing template param name for Stride fwd decl.	Xiaoxiang Cao	2020-04-30
\|
*	Fix the embarrassingly incomplete fix to the embarrassing bug in blocked ↵	Rasmus Munk Larsen	2020-04-29
\| \| \| \|	transpose.
*	Fix (embarrassing) bug in blocked transpose.	Rasmus Munk Larsen	2020-04-29
\|
*	Add missing transpose in cleanup loop. Without it, we trip an assertion in ↵	Rasmus Munk Larsen	2020-04-29
\| \| \| \|	debug mode.
*	Fix compilation error with Clang on Android: _mm_extract_epi64 fails to compile.	Rasmus Munk Larsen	2020-04-29
\|
*	Extend support for Packet16b:	Rasmus Munk Larsen	2020-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add ptranspose<,4> to support matmul and add unit test for Matrix<bool> Matrix<bool> * work around a bug in slicing of Tensor<bool>. * Add tensor tests This speeds up matmul for boolean matrices by about 10x name old time/op new time/op delta BM_MatMul<bool>/8 267ns ± 0% 479ns ± 0% +79.25% (p=0.008 n=5+5) BM_MatMul<bool>/32 6.42µs ± 0% 0.87µs ± 0% -86.50% (p=0.008 n=5+5) BM_MatMul<bool>/64 43.3µs ± 0% 5.9µs ± 0% -86.42% (p=0.008 n=5+5) BM_MatMul<bool>/128 315µs ± 0% 44µs ± 0% -85.98% (p=0.008 n=5+5) BM_MatMul<bool>/256 2.41ms ± 0% 0.34ms ± 0% -85.68% (p=0.008 n=5+5) BM_MatMul<bool>/512 18.8ms ± 0% 2.7ms ± 0% -85.53% (p=0.008 n=5+5) BM_MatMul<bool>/1k 149ms ± 0% 22ms ± 0% -85.40% (p=0.008 n=5+5)
*	Block transposeInPlace() when the matrix is real and square. This yields a ↵	Rasmus Munk Larsen	2020-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	large speedup because we transpose in registers (or L1 if we spill), instead of one packet at a time, which in the worst case makes the code write to the same cache line PacketSize times instead of once. rmlarsen@rmlarsen4:.../eigen_bench/google3$ benchy --benchmarks=.TransposeInPlace.float.* --reference=srcfs experimental/users/rmlarsen/bench:matmul_bench 10 / 10 [====================================================================================================================================================================================================================] 100.00% 2m50s (Generated by http://go/benchy. Settings: --runs 5 --benchtime 1s --reference "srcfs" --benchmarks ".TransposeInPlace.float.*" experimental/users/rmlarsen/bench:matmul_bench) name old time/op new time/op delta BM_TransposeInPlace<float>/4 9.84ns ± 0% 6.51ns ± 0% -33.80% (p=0.008 n=5+5) BM_TransposeInPlace<float>/8 23.6ns ± 1% 17.6ns ± 0% -25.26% (p=0.016 n=5+4) BM_TransposeInPlace<float>/16 78.8ns ± 0% 60.3ns ± 0% -23.50% (p=0.029 n=4+4) BM_TransposeInPlace<float>/32 302ns ± 0% 229ns ± 0% -24.40% (p=0.008 n=5+5) BM_TransposeInPlace<float>/59 1.03µs ± 0% 0.84µs ± 1% -17.87% (p=0.016 n=5+4) BM_TransposeInPlace<float>/64 1.20µs ± 0% 0.89µs ± 1% -25.81% (p=0.008 n=5+5) BM_TransposeInPlace<float>/128 8.96µs ± 0% 3.82µs ± 2% -57.33% (p=0.008 n=5+5) BM_TransposeInPlace<float>/256 152µs ± 3% 17µs ± 2% -89.06% (p=0.008 n=5+5) BM_TransposeInPlace<float>/512 837µs ± 1% 208µs ± 0% -75.15% (p=0.008 n=5+5) BM_TransposeInPlace<float>/1k 4.28ms ± 2% 1.08ms ± 2% -74.72% (p=0.008 n=5+5)
*	Add support to vector instructions to Packet16uc and Packet16c	Pedro Caldeira	2020-04-27
\|
*	Remove unused packet op "preduxp".	Rasmus Munk Larsen	2020-04-23
\|
*	BooleanRedux.h: Add more EIGEN_DEVICE_FUNC qualifiers.	René Wagner	2020-04-23
\| \| \| \|	This enables operator== on Eigen matrices in device code.
*	Add Packet8s and Packet8us to support signed/unsigned int16/short Altivec ↵	Pedro Caldeira	2020-04-21
\| \| \| \|	vector operations
*	Fix bug in ptrue for Packet16b.	Rasmus Munk Larsen	2020-04-20
\|
*	Add partial vectorization for matrices and tensors of bool. This speeds up ↵	Rasmus Munk Larsen	2020-04-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	boolean operations on Tensors by up to 25x. Benchmark numbers for the logical and of two NxN tensors: name old time/op new time/op delta BM_booleanAnd_1T/3 [using 1 threads] 14.6ns ± 0% 14.4ns ± 0% -0.96% BM_booleanAnd_1T/4 [using 1 threads] 20.5ns ±12% 9.0ns ± 0% -56.07% BM_booleanAnd_1T/7 [using 1 threads] 41.7ns ± 0% 10.5ns ± 0% -74.87% BM_booleanAnd_1T/8 [using 1 threads] 52.1ns ± 0% 10.1ns ± 0% -80.59% BM_booleanAnd_1T/10 [using 1 threads] 76.3ns ± 0% 13.8ns ± 0% -81.87% BM_booleanAnd_1T/15 [using 1 threads] 167ns ± 0% 16ns ± 0% -90.45% BM_booleanAnd_1T/16 [using 1 threads] 188ns ± 0% 16ns ± 0% -91.57% BM_booleanAnd_1T/31 [using 1 threads] 667ns ± 0% 34ns ± 0% -94.83% BM_booleanAnd_1T/32 [using 1 threads] 710ns ± 0% 35ns ± 0% -95.01% BM_booleanAnd_1T/64 [using 1 threads] 2.80µs ± 0% 0.11µs ± 0% -95.93% BM_booleanAnd_1T/128 [using 1 threads] 11.2µs ± 0% 0.4µs ± 0% -96.11% BM_booleanAnd_1T/256 [using 1 threads] 44.6µs ± 0% 2.5µs ± 0% -94.31% BM_booleanAnd_1T/512 [using 1 threads] 178µs ± 0% 10µs ± 0% -94.35% BM_booleanAnd_1T/1k [using 1 threads] 717µs ± 0% 78µs ± 1% -89.07% BM_booleanAnd_1T/2k [using 1 threads] 2.87ms ± 0% 0.31ms ± 1% -89.08% BM_booleanAnd_1T/4k [using 1 threads] 11.7ms ± 0% 1.9ms ± 4% -83.55% BM_booleanAnd_1T/10k [using 1 threads] 70.3ms ± 0% 17.2ms ± 4% -75.48%
*	Move eigen_packet_wrapper to GenericPacketMath.h and use it for ↵	Rasmus Munk Larsen	2020-04-15
\| \| \| \| \| \| \|	SSE/AVX/AVX512 as it is already used for NEON. This will allow us to define multiple packet types backed by the same vector type, e.g., __m128i. Use this machanism to define packets for half and clean up the packet op implementations.
*	Fix typo in TypeCasting.h	Rasmus Munk Larsen	2020-04-14
\|
*	Fix big in vectorized casting of	Rasmus Munk Larsen	2020-04-14
\| \| \| \| \| \|	{uint8, int8} -> {int16, uint16, int32, uint32, float} {uint16, int16} -> {int32, uint32, int64, uint64, float} for NEON. These conversions were advertised as vectorized, but not actually implemented.
*	CommaInitializer wrongfully asserted for 0-sized blocks	Christoph Hertzberg	2020-04-13
\| \| \| \|	commainitialier unit-test never actually called `test_block_recursion`, which also was not correctly implemented and would have caused too deep template recursion.
*	Fixed commainitializer test.	Antonio Sanchez	2020-04-10
\| \| \| \| \| \|	The removed `finished()` call was responsible for enforcing that the initializer was provided the correct number of values. Putting it back in to restore previous behavior.
*	Speed up matrix multiplication for small to medium size matrices by using ↵	Rasmus Munk Larsen	2020-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	half- or quarter-packet vectorized loads in gemm_pack_rhs if they have size 4, instead of dropping down the the scalar path. Benchmark measurements below are for computing ```c.noalias() = a.transpose() * b;``` for square RowMajor matrices of varying size. Measured improvement with AVX+FMA: name old time/op new time/op delta BM_MatMul_ATB/8 139ns ± 1% 129ns ± 1% -7.49% (p=0.008 n=5+5) BM_MatMul_ATB/32 1.46µs ± 1% 1.22µs ± 0% -16.72% (p=0.008 n=5+5) BM_MatMul_ATB/64 8.43µs ± 1% 7.41µs ± 0% -12.04% (p=0.008 n=5+5) BM_MatMul_ATB/128 56.8µs ± 1% 52.9µs ± 1% -6.83% (p=0.008 n=5+5) BM_MatMul_ATB/256 407µs ± 1% 395µs ± 3% -2.94% (p=0.032 n=5+5) BM_MatMul_ATB/512 3.27ms ± 3% 3.18ms ± 1% ~ (p=0.056 n=5+5) Measured improvement for AVX512: name old time/op new time/op delta BM_MatMul_ATB/8 167ns ± 1% 154ns ± 1% -7.63% (p=0.008 n=5+5) BM_MatMul_ATB/32 1.08µs ± 1% 0.83µs ± 3% -23.58% (p=0.008 n=5+5) BM_MatMul_ATB/64 6.21µs ± 1% 5.06µs ± 1% -18.47% (p=0.008 n=5+5) BM_MatMul_ATB/128 36.1µs ± 2% 31.3µs ± 1% -13.32% (p=0.008 n=5+5) BM_MatMul_ATB/256 263µs ± 2% 242µs ± 2% -7.92% (p=0.008 n=5+5) BM_MatMul_ATB/512 1.95ms ± 2% 1.91ms ± 2% ~ (p=0.095 n=5+5) BM_MatMul_ATB/1k 15.4ms ± 4% 14.8ms ± 2% ~ (p=0.095 n=5+5)
*	Missing struct definition in NumTraits	Antonio Sanchez	2020-04-07
\|
*	Add numeric_limits min and max for bool	Akshay Naresh Modi	2020-04-06
\| \| \| \|	This will allow (among other things) computation of argmax and argmin of bool tensors
*	Bugfix: conjugate_gradient did not compile with lazy-evaluated RealScalar	Bernardo Bahia Monteiro	2020-03-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The error generated by the compiler was: no matching function for call to 'maxi' RealScalar threshold = numext::maxi(toltolrhsNorm2,considerAsZero); The important part in the following notes was: candidate template ignored: deduced conflicting types for parameter 'T'" ('codi::Multiply11<...>' vs. 'codi::ActiveReal<...>') EIGEN_ALWAYS_INLINE T maxi(const T& x, const T& y) I am using CoDiPack to provide the RealScalar type. This bug was introduced in bc000deaa Fix conjugate-gradient for very small rhs
*	Fix bug in ↵	Rasmus Munk Larsen	2020-03-27
\| \| \| \|	https://gitlab.com/libeigen/eigen/-/commit/52d54278beefee8b2f19dcca4fd900916154e174
*	NEON: Fixed MSVC types definitions	Joel Holdsworth	2020-03-26
\|
*	Additional NEON packet-math operations	Joel Holdsworth	2020-03-26
\|
*	Adhere to recommended load/store intrinsics for pp64le	Everton Constantino	2020-03-23
\|
*	Fixing float32's pround halfway criteria to match STL's criteria.	Everton Constantino	2020-03-21
\|
*	Fixed:	Alessio M	2020-03-21
\| \| \| \| \|	- access violation when initializing 0x0 matrices - exception can be thrown during stack unwind while comma-initializing a matrix if eigen_assert if configured to throw
*	Update VectorwiseOp.h to allow Plugins similar to MatrixBase.h or ArrayBase.h	dlazenby	2020-03-20
\|
*	Bug https://gitlab.com/libeigen/eigen/-/issues/1415: add missing ↵	Masaki Murooka	2020-03-20
\| \| \| \|	EIGEN_DEVICE_FUNC to diagonal_product_evaluator_base.
*	Remove reference to non-existent unary_op_base class.	Rasmus Munk Larsen	2020-03-19
\|
*	Add missing arguments to numext::absdiff().	Rasmus Munk Larsen	2020-03-19
\|
*	Add absolute_difference coefficient-wise binary Array function	Joel Holdsworth	2020-03-19
\|
*	Reenabling packetmath unsigned tests, adding dummy pabs for relevant unsigned	Everton Constantino	2020-03-19
\| \| \| \|	types.
*	Add shift_left<N> and shift_right<N> coefficient-wise unary Array functions	Joel Holdsworth	2020-03-19
\|
*	Implement integer square-root for NEON	Joel Holdsworth	2020-03-19
\|
*	Update NullaryFunctors.h	Allan Leal	2020-03-16
\|
*	Fixing HIP breakage caused by the recent commit that introduces Packet4h2 as ↵	Deven Desai	2020-03-12
\| \| \| \|	the Eigen::Half packet type
*	NEON: Added int64_t and uint64_t packet math	Joel Holdsworth	2020-03-10
\|
*	NEON: Added int8_t and uint8_t packet math	Joel Holdsworth	2020-03-10
\|
*	NEON: Added int16_t and uint16_t packet math	Joel Holdsworth	2020-03-10
\|
*	NEON: Added uint32_t packet math	Joel Holdsworth	2020-03-10
\|
*	NEON: Implemented half-size vectors	Joel Holdsworth	2020-03-10
\|
*	NEON: Set packet_traits<double> flags	Joel Holdsworth	2020-03-10
\|