| Commit message (Collapse) | Author | Age |
|
|
|
| |
the evaluation of an expression.
|
| |
|
| |
|
| |
|
|
|
|
| |
Misc fixes and API cleanups.
|
| |
|
|
|
|
| |
Use mempy to speedup tensor copies whenever possible.
|
|
|
|
|
|
|
|
| |
efficiently compute convolutions and contractions in the future:
* The scheduling of computation is moved out the the assignment code and into a new TensorExecutor class
* The assignment itself is now a regular node on the expression tree
* The expression evaluators start by recursively evaluating all their subexpressions if needed
|
|
|
|
|
|
| |
partial template specialization to optimize the strategy of each evaluator for each device type.
Started work on partial evaluations.
|
|
|
|
|
| |
Updated expression evaluation mechanism to also compute the size of the tensor result
Misc fixes and improvements.
|
|
|
|
|
| |
Added the ability to parallelize the evaluation of a tensor expression over multiple cpu cores.
Added the ability to offload the evaluation of a tensor expression to a GPU.
|
|
* Added ability to map a region of the memory to a tensor
* Added basic support for unary and binary coefficient wise expressions, such as addition or square root
* Provided an emulation layer to make it possible to compile the code with compilers (such as nvcc) that don't support cxx11.
|