| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
| |
number on gpu.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
partial template specialization to optimize the strategy of each evaluator for each device type.
Started work on partial evaluations.
|
| |
|
| |
|
|
|
|
|
| |
Updated expression evaluation mechanism to also compute the size of the tensor result
Misc fixes and improvements.
|
|
|
|
|
|
|
|
| |
* comparison (<, <=, ==, !=, ...)
* selection
* nullary ops such as random or constant generation
* misc unary ops such as log(), exp(), or a user defined unaryExpr()
Cleaned up the code a little.
|
|
|
|
|
| |
Added the ability to parallelize the evaluation of a tensor expression over multiple cpu cores.
Added the ability to offload the evaluation of a tensor expression to a GPU.
|
|
|
|
| |
Improved support for tensor expressions.
|
|
* Added ability to map a region of the memory to a tensor
* Added basic support for unary and binary coefficient wise expressions, such as addition or square root
* Provided an emulation layer to make it possible to compile the code with compilers (such as nvcc) that don't support cxx11.
|