| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
| |
gpu code with a non cuda compiler results in a linking error instead of bogus code.
|
|
|
|
| |
on a GPU device
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
efficiently compute convolutions and contractions in the future:
* The scheduling of computation is moved out the the assignment code and into a new TensorExecutor class
* The assignment itself is now a regular node on the expression tree
* The expression evaluators start by recursively evaluating all their subexpressions if needed
|
|
|
|
|
|
| |
partial template specialization to optimize the strategy of each evaluator for each device type.
Started work on partial evaluations.
|
|
|
|
|
| |
Updated expression evaluation mechanism to also compute the size of the tensor result
Misc fixes and improvements.
|
|
Added the ability to parallelize the evaluation of a tensor expression over multiple cpu cores.
Added the ability to offload the evaluation of a tensor expression to a GPU.
|