Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Create the ability to disable the specialized gemm_pack_rhs in Eigen (only ↵ | Chip Kerchner | 2021-06-30 |
| | | | | PPC) for TensorFlow | ||
* | EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X ↵ | Chip-Kerchner | 2021-06-16 |
| | | | | slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate. | ||
* | Fix taking address of rvalue compiler issue with TensorFlow (plus other ↵ | Chip-Kerchner | 2021-04-21 |
| | | | | warnings). | ||
* | Fix address of temporary object errors in clang11. | Chip Kerchner | 2021-04-02 |
| | | | | This fixes the problem with taking the address of temporary objects which clang11 treats as errors. | ||
* | Fixed performance issues for complex VSX and P10 MMA in gebp_kernel (level 3). | Chip Kerchner | 2021-03-25 |
| | |||
* | Fix clang compile when no MMA flags are set. Simplify MMA compiler detection. | Chip-Kerchner | 2021-02-24 |
| | |||
* | Fixes to support old and new versions of the compilers for built-ins. Cast ↵ | Chip-Kerchner | 2021-02-24 |
| | | | | to non-const when using vector_pair with certain built-ins. | ||
* | Fixed performance issues for VSX and P10 MMA in general_matrix_matrix_product | Chip Kerchner | 2021-02-17 |
| | |||
* | Eliminate implicit conversions from float to double. | Antonio Sanchez | 2021-02-01 |
| | |||
* | Add support for dynamic dispatch of MMA instructions for POWER 10 | Pedro Caldeira | 2020-11-12 |
| | |||
* | MatrixProuct enhancements: | Everton Constantino | 2020-09-02 |
| | | | | | | | | | | | | | - Changes to Altivec/MatrixProduct Adapting code to gcc 10. Generic code style and performance enhancements. Adding PanelMode support. Adding stride/offset support. Enabling float64, std::complex and std::complex. Fixing lack of symm_pack. Enabling mixedtypes. - Adding std::complex tests to blasutil. - Adding an implementation of storePacketBlock when Incr!= 1. | ||
* | - Vectorizing MMA packing. | Everton Constantino | 2020-05-19 |
- Optimizing MMA kernel. - Adding PacketBlock store to blas_data_mapper. |