| Commit message (Collapse) | Author | Age |
| |
|
| |
|
| |
|
|
|
|
| |
disable multi-threaded GEMM on non-x86 without c++11.
|
| |
|
|
|
|
| |
Found using `codespell` and `grep` from downstream FreeCAD
|
|
|
|
| |
columns.
|
|
|
|
| |
(sync is set from and compared to an Index)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
threads when the inner dimension is small.
Timing for square matrices is unchanged, but both CPU and Wall time are significantly improved for skinny matrices. The benchmarks below are for multiplying NxK * KxN matrices with test names of the form BM_OuterishProd/N/K.
Improvements in Wall time:
Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_OuterishProd/64/1 3088 1610 +47.9%
BM_OuterishProd/64/4 3562 2414 +32.2%
BM_OuterishProd/64/32 8861 7815 +11.8%
BM_OuterishProd/128/1 11363 6504 +42.8%
BM_OuterishProd/128/4 11128 9794 +12.0%
BM_OuterishProd/128/64 27691 27396 +1.1%
BM_OuterishProd/256/1 33214 28123 +15.3%
BM_OuterishProd/256/4 34312 36818 -7.3%
BM_OuterishProd/256/128 174866 176398 -0.9%
BM_OuterishProd/512/1 7963684 104224 +98.7%
BM_OuterishProd/512/4 7987913 112867 +98.6%
BM_OuterishProd/512/256 8198378 1306500 +84.1%
BM_OuterishProd/1k/1 7356256 324432 +95.6%
BM_OuterishProd/1k/4 8129616 331621 +95.9%
BM_OuterishProd/1k/512 27265418 7517538 +72.4%
Improvements in CPU time:
Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_OuterishProd/64/1 6169 1608 +73.9%
BM_OuterishProd/64/4 7117 2412 +66.1%
BM_OuterishProd/64/32 17702 15616 +11.8%
BM_OuterishProd/128/1 45415 6498 +85.7%
BM_OuterishProd/128/4 44459 9786 +78.0%
BM_OuterishProd/128/64 110657 109489 +1.1%
BM_OuterishProd/256/1 265158 28101 +89.4%
BM_OuterishProd/256/4 274234 183885 +32.9%
BM_OuterishProd/256/128 1397160 1408776 -0.8%
BM_OuterishProd/512/1 78947048 520703 +99.3%
BM_OuterishProd/512/4 86955578 1349742 +98.4%
BM_OuterishProd/512/256 74701613 15584661 +79.1%
BM_OuterishProd/1k/1 78352601 3877911 +95.1%
BM_OuterishProd/1k/4 78521643 3966221 +94.9%
BM_OuterishProd/1k/512 258104736 89480530 +65.3%
|
| |
|
|
|
|
| |
might be lower than the number of requested ones
|
|\ |
|
| |
| |
| |
| | |
Also optimized the blocking parameters to take into account the number of threads used for a computation
|
|/
|
|
| |
manual new[]/delete[] pairs in AMD and Paralellizer
|
|
|
|
|
|
| |
speeup on Haswell.
This changeset also introduce new vector functions: ploadquad and predux4.
|
| |
|
|
|
|
|
|
|
| |
initParallel()
function which must be called at the initialization time of any multi-threaded
application calling Eigen from multiple threads.
|
| |
|
|
|
|
| |
- include MKL headers outside the Eigen namespace.
|
|
|
|
|
|
|
|
|
|
| |
* * *
License disclaimer changed to BSD license for MKL_support.h
* * *
Pardiso support fixed, test added.
blas/lapack tests fixed: Scalar parameter was added in Cholesky, product_matrix_vector_triangular remaned to triangular_matrix_vector_product.
* * *
PARDISO test was added physically.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
- ensure static allocation for the product of "large" fixed size matrix
|
|
|
|
| |
* fix an issue preventing multithreading (now Dynamic = -1 ...)
|
| |
|
| |
|
|
|
|
| |
As discussed on the list (too long to explain here).
|
| |
|
|
|
|
| |
* disbale parallelisation if we already are in a parallel session
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
paking of the same data
|
| |
|
|
|