| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
|
|
| |
{uint8, int8} -> {int16, uint16, int32, uint32, float}
{uint16, int16} -> {int32, uint32, int64, uint64, float}
for NEON. These conversions were advertised as vectorized, but not actually implemented.
|
|
|
|
| |
commainitialier unit-test never actually called `test_block_recursion`, which also was not correctly implemented and would have caused too deep template recursion.
|
|
|
|
|
|
| |
The removed `finished()` call was responsible for enforcing that the
initializer was provided the correct number of values. Putting it back in
to restore previous behavior.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
half- or quarter-packet vectorized loads in gemm_pack_rhs if they have size 4, instead of dropping down the the scalar path.
Benchmark measurements below are for computing ```c.noalias() = a.transpose() * b;``` for square RowMajor matrices of varying size.
Measured improvement with AVX+FMA:
name old time/op new time/op delta
BM_MatMul_ATB/8 139ns ± 1% 129ns ± 1% -7.49% (p=0.008 n=5+5)
BM_MatMul_ATB/32 1.46µs ± 1% 1.22µs ± 0% -16.72% (p=0.008 n=5+5)
BM_MatMul_ATB/64 8.43µs ± 1% 7.41µs ± 0% -12.04% (p=0.008 n=5+5)
BM_MatMul_ATB/128 56.8µs ± 1% 52.9µs ± 1% -6.83% (p=0.008 n=5+5)
BM_MatMul_ATB/256 407µs ± 1% 395µs ± 3% -2.94% (p=0.032 n=5+5)
BM_MatMul_ATB/512 3.27ms ± 3% 3.18ms ± 1% ~ (p=0.056 n=5+5)
Measured improvement for AVX512:
name old time/op new time/op delta
BM_MatMul_ATB/8 167ns ± 1% 154ns ± 1% -7.63% (p=0.008 n=5+5)
BM_MatMul_ATB/32 1.08µs ± 1% 0.83µs ± 3% -23.58% (p=0.008 n=5+5)
BM_MatMul_ATB/64 6.21µs ± 1% 5.06µs ± 1% -18.47% (p=0.008 n=5+5)
BM_MatMul_ATB/128 36.1µs ± 2% 31.3µs ± 1% -13.32% (p=0.008 n=5+5)
BM_MatMul_ATB/256 263µs ± 2% 242µs ± 2% -7.92% (p=0.008 n=5+5)
BM_MatMul_ATB/512 1.95ms ± 2% 1.91ms ± 2% ~ (p=0.095 n=5+5)
BM_MatMul_ATB/1k 15.4ms ± 4% 14.8ms ± 2% ~ (p=0.095 n=5+5)
|
| |
|
|
|
|
| |
This will allow (among other things) computation of argmax and argmin of bool tensors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The error generated by the compiler was:
no matching function for call to 'maxi'
RealScalar threshold = numext::maxi(tol*tol*rhsNorm2,considerAsZero);
The important part in the following notes was:
candidate template ignored: deduced conflicting
types for parameter 'T'"
('codi::Multiply11<...>' vs. 'codi::ActiveReal<...>')
EIGEN_ALWAYS_INLINE T maxi(const T& x, const T& y)
I am using CoDiPack to provide the RealScalar type.
This bug was introduced in bc000deaa Fix conjugate-gradient for very small rhs
|
|
|
|
| |
https://gitlab.com/libeigen/eigen/-/commit/52d54278beefee8b2f19dcca4fd900916154e174
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
- access violation when initializing 0x0 matrices
- exception can be thrown during stack unwind while comma-initializing a matrix if eigen_assert if configured to throw
|
| |
|
|
|
|
| |
EIGEN_DEVICE_FUNC to diagonal_product_evaluator_base.
|
| |
|
| |
|
| |
|
|
|
|
| |
types.
|
| |
|
| |
|
| |
|
|
|
|
| |
the Eigen::Half packet type
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
expose pmul/add/div/min/max on host
|
|
|
| |
This reverts commit 5ca10480b0756e40b0723d90adeba8506291fc7c
|
|
|
| |
This reverts commit 44df2109c8c700222643a9a45f144676348f4df1
|
|
|
| |
This reverts commit e9cc0cd353803a818204e48054bd89699b84e6c6
|
|
|
|
|
| |
<complex>.
This implicit dependency does no longer exist in a recent llbm release (sha 78be61871704).
|
|
|
|
|
| |
See comment and
<https://gitlab.com/libeigen/eigen/merge_requests/46#note_270622952>.
|
|
|
|
| |
See comment for details.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See
<https://stackoverflow.com/questions/59709148/ensuring-that-eigen-uses-avx-vectorization-for-a-certain-operation>
for an explanation of the problem this solves.
In short, for some reason, before this commit the half-packet is
selected when the array / matrix size is not a multiple of
`unpacket_traits<PacketType>::size`, where `PacketType` starts out
being the full Packet.
For example, for some data of 100 `float`s, `Packet4f` will be
selected rather than `Packet8f`, because 100 is not a multiple of 8,
the size of `Packet8f`.
This commit switches to selecting the half-packet if the size is
less than the packet size, which seems to make more sense.
As I stated in the SO post I'm not sure that I'm understanding the
issue correctly, but this fix resolves the issue in my program. Moreover,
`make check` passes, with the exception of line 614 and 616 in
`test/packetmath.cpp`, which however also fail on master on my machine:
CHECK_CWISE1_IF(PacketTraits::HasBessel, numext::bessel_i0, internal::pbessel_i0);
...
CHECK_CWISE1_IF(PacketTraits::HasBessel, numext::bessel_i1, internal::pbessel_i1);
|
|
|
|
| |
top-level header in Eigen/Core.
|
| |
|
| |
|
|
|
|
| |
the Windows build breaks when trying to compile numext::rint<double>.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
appears to be slightly slower.
|
|
|
|
| |
function were not set carefully enough in the original commit, and some arguments would cause the function to return values greater than 1. This change set the versions found by scanning all floating point numbers (using std::nextafterf()).
|
|
|
|
|
|
| |
This provides a new op that matches std::rint and previous behavior of
pround. Also adds corresponding unsupported/../Tensor op.
Performance is the same as e. g. floor (tested SSE/AVX).
|