| Commit message (Collapse) | Author | Age |
|
|
|
| |
LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations.
|
|
|
| |
`combine_scalar_factors` helper function.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Changes to Altivec/MatrixProduct
Adapting code to gcc 10.
Generic code style and performance enhancements.
Adding PanelMode support.
Adding stride/offset support.
Enabling float64, std::complex and std::complex.
Fixing lack of symm_pack.
Enabling mixedtypes.
- Adding std::complex tests to blasutil.
- Adding an implementation of storePacketBlock when Incr!= 1.
|
|
|
|
|
| |
- Optimizing MMA kernel.
- Adding PacketBlock store to blas_data_mapper.
|
| |
|
| |
|
|
|
|
|
|
|
| |
product.
Before only s*A*B was caught which was both inconsistent with GEMM, sub-optimal,
and could even lead to compilation-errors (https://stackoverflow.com/questions/54738495).
|
|
|
|
|
|
| |
AVX512).
This commit also removes "half-packet" from data-mappers: it was not used and conceptually broken anyways.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
performance of modern CPU.
The previous code has been optimized for Intel core2 for which unaligned loads/stores were prohibitively expensive.
This new version exhibits much higher instruction independence (better pipelining) and explicitly leverage FMA.
According to my benchmark, on Haswell this new kernel is always faster than the previous one, and sometimes even twice as fast.
Even higher performance could be achieved with a better blocking size heuristic and, perhaps, with explicit prefetching.
We should also check triangular product/solve to optimally exploit this new kernel (working on vertical panel of 4 columns is probably not optimal anymore).
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
with a constant expression.
This slightly complexifies the type of the expressions and implies that we now have to distinguish between scalar*expr and expr*scalar to catch scalar-multiple expression (e.g., see BlasUtil.h), but this brings several advantages:
- it makes it clear on each side the scalar is applied,
- it clearly reflects that we are dealing with a binary-expression,
- the complexity of the type is hidden through macros defined at the end of Macros.h,
- distinguishing between "scalar op expr" and "expr op scalar" is important to support non commutative fields (like quaternions)
- "scalar op expr" is now fully equivalent to "ConstantExpr(scalar) op expr"
- scalar_multiple_op, scalar_quotient1_op and scalar_quotient2_op are not used anymore in officially supported modules (still used in Tensor)
|
|
|
|
|
|
|
|
| |
conversions.
This fixes "conversion from pointer to same-sized integral type" warnings by ICC.
Ideally, we would use the std::[u]intptr_t types all the time, but since they are C99/C++11 only,
let's be safe.
|
| |
|
|
|
|
| |
parameter, and add a first_default_aligned variante calling first_aligned with the requirement of the largest packet for the given scalar type.
|
| |
|
| |
|
|
|
|
| |
issue.
|
|\ |
|
| | |
|
| | |
|
|/ |
|
|\ |
|
| | |
|
|/ |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
missing ones, etc.
(note that const qualifiers are set by internall::nested)
|
| |
|
|
|
|
|
|
|
|
|
|
| |
* * *
License disclaimer changed to BSD license for MKL_support.h
* * *
Pardiso support fixed, test added.
blas/lapack tests fixed: Scalar parameter was added in Cholesky, product_matrix_vector_triangular remaned to triangular_matrix_vector_product.
* * *
PARDISO test was added physically.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
Renamed meta_{true|false} to {true|false}_type, meta_if to conditional, is_same_type to is_same, un{ref|pointer|const} to remove_{reference|pointer|const} and makeconst to add_const.
Changed boolean type 'ret' member to 'value'.
Changed 'ret' members refering to types to 'type'.
Adapted all code occurences.
|
| |
|
|
|
|
| |
* improve compilation error in case of type mismatch
|
|
|
|
| |
* merge ei_product_blocking_traits into ei_gepb_traits
|
| |
|
|
|
|
|
| |
- improve support of colmajor by vector and matrix - matrix
- now all configurations are well handled, but the perf are not always very good
|
|
|
|
|
|
|
|
| |
* pload* and pset1 are now templated on the packet type
* gemv routines are now embeded into a structure with
a consistent API with respect to gemm
* some configurations of vector * matrix and matrix * matrix works fine,
some need more work...
|
|\ |
|
| |
| |
| |
| | |
* fix weird compilation error when constructing a matrix with a row by matrix product
|
| | |
|
|/
|
|
| |
cases)
|
| |
|