aboutsummaryrefslogtreecommitdiffhomepage
path: root/unsupported/Eigen/CXX11/src/Tensor/TensorReductionSycl.h
Commit message (Collapse)AuthorAge
* [SYCL clean up the code] : removing exrta #pragma unroll in SYCL which was ↵Gravatar mehdi-goli2020-10-28
| | | | causing issues in embeded systems
* [SYCL] Rebasing the SYCL support branch on top of the Einge upstream master ↵Gravatar Mehdi Goli2019-11-28
| | | | | | | | | | | | | | | | | | | | | | branch. * Unifying all loadLocalTile from lhs and rhs to an extract_block function. * Adding get_tensor operation which was missing in TensorContractionMapper. * Adding the -D method missing from cmake for Disable_Skinny Contraction operation. * Wrapping all the indices in TensorScanSycl into Scan parameter struct. * Fixing typo in Device SYCL * Unifying load to private register for tall/skinny no shared * Unifying load to vector tile for tensor-vector/vector-tensor operation * Removing all the LHS/RHS class for extracting data from global * Removing Outputfunction from TensorContractionSkinnyNoshared. * Combining the local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining General Tensor-Vector and VectorTensor contraction into one kernel. * Making double buffering optional for Tensor contraction when local memory is version is used. * Modifying benchmark to accept custom Reduction Sizes * Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host * Adding Test for SYCL * Modifying SYCL CMake
* Fix typos found using codespellGravatar Gael Guennebaud2018-06-07
|
* Adding TensorIndexTuple and TensorTupleReduceOP backend (ArgMax/Min) for ↵Gravatar Mehdi Goli2017-03-07
| | | | sycl; fixing the address space issue for const TensorMap; converting all discard_write to write due to data missmatch.
* Adding sycl backend for TensorCustomOp; fixing the partial lhs modification ↵Gravatar Mehdi Goli2017-02-28
| | | | issue on sycl when the rhs is TensorContraction, reduction or convolution; Fixing the partial modification for memset when sycl backend is used.
* Adding mean to TensorReductionSycl.hGravatar Mehdi Goli2017-02-07
|
* Fixing TensorReductionSycl for min and max.Gravatar Mehdi Goli2017-02-06
|
* Adding non-deferrenciable pointer track for ComputeCpp backend; Adding ↵Gravatar Mehdi Goli2017-01-19
| | | | TensorConvolutionOp for ComputeCpp; fixing typos. modifying TensorDeviceSycl to use the LegacyPointer class.
* Adding Tensor ReverseOp; TensorStriding; TensorConversionOp; Modifying ↵Gravatar Mehdi Goli2017-01-16
| | | | Tensor Contractsycl to be located in any place in the expression tree.
* Fixes auto appearance in functor template argument for reduction.Gravatar Luke Iwanski2017-01-04
|
* Converting all parallel for lambda to functor in order to prevent kernel ↵Gravatar Mehdi Goli2016-12-16
| | | | duplication name error; adding tensorConcatinationOp backend for sycl.
* Adding asynchronous execution as it improves the performance.Gravatar Mehdi Goli2016-12-14
|
* Adding sycl backend for TensorPadding.h; disbaling __unit128 for sycl in ↵Gravatar Mehdi Goli2016-12-01
| | | | TensorIntDiv.h; disabling cashsize for sycl in tensorDeviceDefault.h; adding sycl backend for StrideSliceOP ; removing sycl compiler warning for creating an array of size 0 in CXX11Meta.h; cleaning up the sycl backend code.
* Adding TensorShuffling backend for sycl; adding TensorReshaping backend for ↵Gravatar Mehdi Goli2016-11-29
| | | | sycl; cleaning up the sycl backend.
* Fixing LLVM error on TensorMorphingSycl.h on GPU; fixing int64_t crash for ↵Gravatar Mehdi Goli2016-11-25
| | | | tensor_broadcast_sycl on GPU; adding get_sycl_supported_devices() on syclDevice.h.
* Removing unsupported device from test case; cleaning the tensor device sycl.Gravatar Mehdi Goli2016-11-23
|
* Modifying TensorDeviceSycl.h to always create buffer of type uint8_t and ↵Gravatar Mehdi Goli2016-11-18
| | | | convert them to the actual type at the execution on the device; adding the queue interface class to separate the lifespan of sycl queue and buffers,created for that queue, from Eigen::SyclDevice; modifying sycl tests to support the evaluation of the results for both row major and column major data layout on all different devices that are supported by Sycl{CPU; GPU; and Host}.
* Adding Memset; optimising MecopyDeviceToHost by removing double copying;Gravatar Mehdi Goli2016-11-10
|
* Converting all sycl buffers to uninitialised device only buffers; adding ↵Gravatar Mehdi Goli2016-11-08
| | | | memcpyHostToDevice and memcpyDeviceToHost on syclDevice; modifying all examples to obey the new rules; moving sycl queue creating to the device based on Benoit suggestion; removing the sycl specefic condition for returning m_result in TensorReduction.h according to Benoit suggestion.
* Removed the sycl include from Eigen/Core and moved it to ↵Gravatar Mehdi Goli2016-11-04
Unsupported/Eigen/CXX11/Tensor; added TensorReduction for sycl (full reduction and partial reduction); added TensorReduction test case for sycl (full reduction and partial reduction); fixed the tile size on TensorSyclRun.h based on the device max work group size;