| Commit message (Collapse) | Author | Age |
... | |
| | |
|
| |
| |
| |
| |
| | |
The TensorScanOp implementation was missing a CUDA kernel launch.
This adds a simple placeholder implementation.
|
| |
| |
| |
| |
| |
| | |
since it's only used in the constructor.
Also avoid taking references to values that may becomes stale after a copy construction.
|
| | |
|
|\ \ |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| |\ \ |
|
| | | | |
|
| | |\ \ |
|
| | | | | |
|
| | | | |
| | | | |
| | | | |
| | | | | |
constant expression to make the code compatible with a wider range of compilers
|
| | | | | |
|
| | | | | |
|
| | | | | |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
* * *
Corrected tanh derivatived, moved test definitions.
* * *
Added more test cases, removed lingering lines
|
| | |\ \ \
| | | | | |
| | | | | |
| | | | | | |
Implement exclusive scan option for Tensor library
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
spread the load over multiple cpus without havind to rely on work stealing.
|
| | |/ / / |
|
| | | |\ \
| | | |/ /
| | |/| | |
|
| | | | | |
|
| | | | | |
|
| | | | | |
|
| | | | | |
|
| | | | |
| | | | |
| | | | |
| | | | | |
ones, and implement scalar_multiple2 and scalar_quotient2 on top of them.
|
| | | | | |
|
| | |\ \ \
| | | | | |
| | | | | |
| | | | | | |
Add small fixes to TensorScanOp
|
| | | | | | |
|
| | | | | | |
|
| | | | | | |
|
| |_|_|/ /
|/| | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
- Replace internal::scalar_product_traits<A,B> by Eigen::ScalarBinaryOpTraits<A,B,OP>
- Remove the "functor_is_product_like" helper (was pretty ugly)
- Currently, OP is not used, but it is available to the user for fine grained tuning
- Currently, only the following operators have been generalized: *,/,+,-,=,*=,/=,+=,-=
- TODO: generalize all other binray operators (comparisons,pow,etc.)
- TODO: handle "scalar op array" operators (currently only * is handled)
- TODO: move the handling of the "void" scalar type to ScalarBinaryOpTraits
|
| | | | | |
|
| | | | |
| | | | |
| | | | |
| | | | | |
TensorDeviceThreadPool.
|
| | | | | |
|
| | | | | |
|
| | | | |
| | | | |
| | | | |
| | | | | |
to evaluate a tensor expression.
|
| | | | | |
|
| | | | | |
|
| | |/ / |
|
| | | | |
|
| | | | |
|
| |/ /
|/| | |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
TernaryFunctors and their executors allow operations on 3-tuples of inputs.
API fully implemented for Arrays and Tensors based on binary functors.
Ported the cephes betainc function (regularized incomplete beta
integral) to Eigen, with support for CPU and GPU, floats, doubles, and
half types.
Added unit tests in array.cpp and cxx11_tensor_cuda.cu
Collapsed revision
* Merged helper methods for betainc across floats and doubles.
* Added TensorGlobalFunctions with betainc(). Removed betainc() from TensorBase.
* Clean up CwiseTernaryOp checks, change igamma_helper to cephes_helper.
* betainc: merge incbcf and incbd into incbeta_cfe. and more cleanup.
* Update TernaryOp and SpecialFunctions (betainc) based on review comments.
|
| | |
| | |
| | |
| | | |
tensor expression
|
|\ \ \
| | | |
| | | |
| | | | |
Add generic scan method
|
| |/ /
|/| |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
AFTER:
BM_fullReduction/10 4541 4543 154017 21.0M items/s
BM_fullReduction/64 5191 5193 100000 752.5M items/s
BM_fullReduction/512 9588 9588 71361 25.5G items/s
BM_fullReduction/4k 244314 244281 2863 64.0G items/s
BM_fullReduction/5k 359382 359363 1946 64.8G items/s
BEFORE:
BM_fullReduction/10 9085 9087 74395 10.5M items/s
BM_fullReduction/64 9478 9478 72014 412.1M items/s
BM_fullReduction/512 14643 14646 46902 16.7G items/s
BM_fullReduction/4k 260338 260384 2678 60.0G items/s
BM_fullReduction/5k 385076 385178 1818 60.5G items/s
|
|/ / |
|
| | |
|