| Commit message (Collapse) | Author | Age |
|
|
|
| |
Tensor Contractsycl to be located in any place in the expression tree.
|
| |
|
|
|
|
| |
duplication name error; adding tensorConcatinationOp backend for sycl.
|
| |
|
|
|
|
| |
tensor_broadcast_sycl on GPU; adding get_sycl_supported_devices() on syclDevice.h.
|
|
|
|
| |
Unsupported/Eigen/CXX11/Tensor; added TensorReduction for sycl (full reduction and partial reduction); added TensorReduction test case for sycl (full reduction and partial reduction); fixed the tile size on TensorSyclRun.h based on the device max work group size;
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TernaryFunctors and their executors allow operations on 3-tuples of inputs.
API fully implemented for Arrays and Tensors based on binary functors.
Ported the cephes betainc function (regularized incomplete beta
integral) to Eigen, with support for CPU and GPU, floats, doubles, and
half types.
Added unit tests in array.cpp and cxx11_tensor_cuda.cu
Collapsed revision
* Merged helper methods for betainc across floats and doubles.
* Added TensorGlobalFunctions with betainc(). Removed betainc() from TensorBase.
* Clean up CwiseTernaryOp checks, change igamma_helper to cephes_helper.
* betainc: merge incbcf and incbd into incbeta_cfe. and more cleanup.
* Update TernaryOp and SpecialFunctions (betainc) based on review comments.
|
|
|
|
|
| |
This is the initial implementation a generic scan operation.
Based on this, cumsum and cumprod method have been added to TensorBase.
|
| |
|
|
|
|
| |
the Tensor class since most of the code currently already use integers.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
previously disabled by mistake
|
| |
|
| |
|
|
|
|
|
|
| |
'generator'. This simplifies the creation of constant tensors initialized using specific regular patterns.
Created a gaussian window generator as a first use case.
|
|
|
|
| |
encoding it in the options.
|
| |
|
|
|
|
| |
Misc fixes and API cleanups.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
efficiently compute convolutions and contractions in the future:
* The scheduling of computation is moved out the the assignment code and into a new TensorExecutor class
* The assignment itself is now a regular node on the expression tree
* The expression evaluators start by recursively evaluating all their subexpressions if needed
|
|
|
|
|
|
| |
partial template specialization to optimize the strategy of each evaluator for each device type.
Started work on partial evaluations.
|
| |
|
| |
|
|
|
|
|
| |
Updated expression evaluation mechanism to also compute the size of the tensor result
Misc fixes and improvements.
|
|
|
|
|
|
|
|
| |
* comparison (<, <=, ==, !=, ...)
* selection
* nullary ops such as random or constant generation
* misc unary ops such as log(), exp(), or a user defined unaryExpr()
Cleaned up the code a little.
|
|
|
|
|
| |
Added the ability to parallelize the evaluation of a tensor expression over multiple cpu cores.
Added the ability to offload the evaluation of a tensor expression to a GPU.
|
|
|
|
| |
Improved support for tensor expressions.
|
|
* Added ability to map a region of the memory to a tensor
* Added basic support for unary and binary coefficient wise expressions, such as addition or square root
* Provided an emulation layer to make it possible to compile the code with compilers (such as nvcc) that don't support cxx11.
|