| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
|
|
| |
{uint8, int8} -> {int16, uint16, int32, uint32, float}
{uint16, int16} -> {int32, uint32, int64, uint64, float}
for NEON. These conversions were advertised as vectorized, but not actually implemented.
|
| |
|
|
|
|
| |
commainitialier unit-test never actually called `test_block_recursion`, which also was not correctly implemented and would have caused too deep template recursion.
|
|
|
|
|
|
| |
The removed `finished()` call was responsible for enforcing that the
initializer was provided the correct number of values. Putting it back in
to restore previous behavior.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
half- or quarter-packet vectorized loads in gemm_pack_rhs if they have size 4, instead of dropping down the the scalar path.
Benchmark measurements below are for computing ```c.noalias() = a.transpose() * b;``` for square RowMajor matrices of varying size.
Measured improvement with AVX+FMA:
name old time/op new time/op delta
BM_MatMul_ATB/8 139ns ± 1% 129ns ± 1% -7.49% (p=0.008 n=5+5)
BM_MatMul_ATB/32 1.46µs ± 1% 1.22µs ± 0% -16.72% (p=0.008 n=5+5)
BM_MatMul_ATB/64 8.43µs ± 1% 7.41µs ± 0% -12.04% (p=0.008 n=5+5)
BM_MatMul_ATB/128 56.8µs ± 1% 52.9µs ± 1% -6.83% (p=0.008 n=5+5)
BM_MatMul_ATB/256 407µs ± 1% 395µs ± 3% -2.94% (p=0.032 n=5+5)
BM_MatMul_ATB/512 3.27ms ± 3% 3.18ms ± 1% ~ (p=0.056 n=5+5)
Measured improvement for AVX512:
name old time/op new time/op delta
BM_MatMul_ATB/8 167ns ± 1% 154ns ± 1% -7.63% (p=0.008 n=5+5)
BM_MatMul_ATB/32 1.08µs ± 1% 0.83µs ± 3% -23.58% (p=0.008 n=5+5)
BM_MatMul_ATB/64 6.21µs ± 1% 5.06µs ± 1% -18.47% (p=0.008 n=5+5)
BM_MatMul_ATB/128 36.1µs ± 2% 31.3µs ± 1% -13.32% (p=0.008 n=5+5)
BM_MatMul_ATB/256 263µs ± 2% 242µs ± 2% -7.92% (p=0.008 n=5+5)
BM_MatMul_ATB/512 1.95ms ± 2% 1.91ms ± 2% ~ (p=0.095 n=5+5)
BM_MatMul_ATB/1k 15.4ms ± 4% 14.8ms ± 2% ~ (p=0.095 n=5+5)
|
|
|
|
|
|
|
|
|
|
|
| |
For random matrices with integer coefficients, many of the tests here lead to
integer overflows. When taking the norm() of a row/column, the squaredNorm()
often overflows to a negative value, leading to domain errors when taking the
sqrt(). This leads to a crash on some systems. By replacing the norm() call by
a squaredNorm(), the values still overflow, but at least there is no domain
error.
Addresses https://gitlab.com/libeigen/eigen/-/issues/1856
|
| |
|
|
|
|
| |
This will allow (among other things) computation of argmax and argmin of bool tensors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The error generated by the compiler was:
no matching function for call to 'maxi'
RealScalar threshold = numext::maxi(tol*tol*rhsNorm2,considerAsZero);
The important part in the following notes was:
candidate template ignored: deduced conflicting
types for parameter 'T'"
('codi::Multiply11<...>' vs. 'codi::ActiveReal<...>')
EIGEN_ALWAYS_INLINE T maxi(const T& x, const T& y)
I am using CoDiPack to provide the RealScalar type.
This bug was introduced in bc000deaa Fix conjugate-gradient for very small rhs
|
| |
|
|
|
|
| |
https://gitlab.com/libeigen/eigen/-/commit/52d54278beefee8b2f19dcca4fd900916154e174
|
|
|
|
| |
https://gitlab.com/libeigen/eigen/-/commit/52d54278beefee8b2f19dcca4fd900916154e174
|
| |
|
| |
|
| |
|
|
|
|
| |
UTF-8, LF, no BOM, and newlines at the end of files
|
| |
|
|
|
|
|
| |
- access violation when initializing 0x0 matrices
- exception can be thrown during stack unwind while comma-initializing a matrix if eigen_assert if configured to throw
|
| |
|
|
|
|
| |
EIGEN_DEVICE_FUNC to diagonal_product_evaluator_base.
|
| |
|
| |
|
| |
|
|
|
|
| |
types.
|
| |
|
| |
|
| |
|
|
|
|
| |
the Eigen::Half packet type
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
expose pmul/add/div/min/max on host
|
|
|
| |
This prevents projects that add Eigen using `add_subdirectory` from using their own custom CMAKE_BUILD_TYPE and have Eigen respect the same custom flags.
|
| |
|
|
|
| |
This reverts commit 5ca10480b0756e40b0723d90adeba8506291fc7c
|
|
|
| |
This reverts commit 44df2109c8c700222643a9a45f144676348f4df1
|
|
|
| |
This reverts commit e9cc0cd353803a818204e48054bd89699b84e6c6
|
|
|
| |
This reverts commit 776960024585b907acc4abc3c59aef605941bb75
|
|
|
|
|
| |
failing with AVX."
This reverts commit b625adffd877639ff5cbe51ea154e1905a3b405c
|
|
|
|
| |
with AVX.
|
|
|
|
|
| |
<complex>.
This implicit dependency does no longer exist in a recent llbm release (sha 78be61871704).
|
|
|
|
|
|
|
| |
Looking at profiles we spend ~10-20% of Steal on simply computing
random % size. We can reduce random 32-bit int into [0, size) range with
a single multiplication and shift. This transformation is described in
https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
|
| |
|