| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
| |
module required to run it on devices supporting SYCL.
* Abstracting the pointer type so that both SYCL memory and pointer can be captured.
* Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class.
* Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node.
* Adding SYCL macro for controlling loop unrolling.
* Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
DataDependancy
* Wrapping data type to the pointer class for sycl in non-terminal nodes; not having that breaks Tensorflow Conv2d code.
* Applying Ronnan's Comments.
* Applying benoit's comments
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TernaryFunctors and their executors allow operations on 3-tuples of inputs.
API fully implemented for Arrays and Tensors based on binary functors.
Ported the cephes betainc function (regularized incomplete beta
integral) to Eigen, with support for CPU and GPU, floats, doubles, and
half types.
Added unit tests in array.cpp and cxx11_tensor_cuda.cu
Collapsed revision
* Merged helper methods for betainc across floats and doubles.
* Added TensorGlobalFunctions with betainc(). Removed betainc() from TensorBase.
* Clean up CwiseTernaryOp checks, change igamma_helper to cephes_helper.
* betainc: merge incbcf and incbd into incbeta_cfe. and more cleanup.
* Update TernaryOp and SpecialFunctions (betainc) based on review comments.
|
| |
|
| |
|
|
|
|
| |
All the vectorization is now defined in the tensor evaluators. This will make it possible to relialably support devices with different packet types in the same compilation unit.
|
| |
|
|
|
|
| |
function returns a floating point value when called on a complex input.
|
|
|
|
| |
the case where it is different from the input type (e.g. abs(complex<float>))
|
| |
|
|
|
|
|
| |
Updated expression evaluation mechanism to also compute the size of the tensor result
Misc fixes and improvements.
|
|
|
|
|
|
|
|
| |
* comparison (<, <=, ==, !=, ...)
* selection
* nullary ops such as random or constant generation
* misc unary ops such as log(), exp(), or a user defined unaryExpr()
Cleaned up the code a little.
|
|
|
|
|
| |
Added the ability to parallelize the evaluation of a tensor expression over multiple cpu cores.
Added the ability to offload the evaluation of a tensor expression to a GPU.
|
|
|
|
| |
Improved support for tensor expressions.
|
|
* Added ability to map a region of the memory to a tensor
* Added basic support for unary and binary coefficient wise expressions, such as addition or square root
* Provided an emulation layer to make it possible to compile the code with compilers (such as nvcc) that don't support cxx11.
|