| Commit message (Collapse) | Author | Age |
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
feature.
|
| |
|
| |
|
|
|
|
| |
large tensors.
|
|
|
|
| |
make it accessible from both the single and multithreaded contraction evaluators.
|
| |
|
| |
|
|\
| |
| |
| | |
Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions.
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| | |
and reductions. The cost model is turned off by default.
|
| | |
|
|\|
| |
| |
| | |
Eigen Tensor cost model part 1.
|
| | |
|
| | |
|
| | |
|
| | |
|
| |\
| |/
|/| |
|
| |
| |
| |
| | |
estimate the cost of evaluating tensor expressions.
|
| | |
|
|/ |
|
|
|
|
| |
makes it possible to decide to vectorize or not depending on the capability of the target cuda architecture. In particular, this enables us to vectorize the processing of fp16 when running on device of capability >= 5.3
|
| |
|
| |
|
| |
|
|\ |
|
| | |
|
|/| |
|
| | |
|
| | |
|
| | |
|
|/ |
|
| |
|
| |
|
| |
|
|
|
|
| |
small improvements
|
| |
|
| |
|
| |
|