Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | [SYCL] Rebasing the SYCL support branch on top of the Einge upstream master ↵ | 2019-11-28 | |
| | | | | | | | | | | | | | | | | | | | | | | branch. * Unifying all loadLocalTile from lhs and rhs to an extract_block function. * Adding get_tensor operation which was missing in TensorContractionMapper. * Adding the -D method missing from cmake for Disable_Skinny Contraction operation. * Wrapping all the indices in TensorScanSycl into Scan parameter struct. * Fixing typo in Device SYCL * Unifying load to private register for tall/skinny no shared * Unifying load to vector tile for tensor-vector/vector-tensor operation * Removing all the LHS/RHS class for extracting data from global * Removing Outputfunction from TensorContractionSkinnyNoshared. * Combining the local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining General Tensor-Vector and VectorTensor contraction into one kernel. * Making double buffering optional for Tensor contraction when local memory is version is used. * Modifying benchmark to accept custom Reduction Sizes * Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host * Adding Test for SYCL * Modifying SYCL CMake | ||
* | Adding synchronisation to convolution kernel for sycl backend. | 2017-03-13 | |
| | |||
* | Adding sycl Benchmarks. | 2017-03-08 | |
| | |||
* | Improved the performance of tensor padding | 2016-05-25 | |
| | |||
* | Added a benchmark to measure the performance of full reductions of 16 bit floats | 2016-05-05 | |
| | |||
* | Use index list for the striding benchmarks | 2016-04-21 | |
| | |||
* | Fixed the type casting benchmarks for fp16 | 2016-04-07 | |
| | |||
* | Fixed the benchmarking of fp16 coefficient wise operations | 2016-04-07 | |
| | |||
* | Made the tensor benchmarks compile on MacOS | 2016-03-23 | |
| | |||
* | Added benchmarks for full reduction | 2016-02-29 | |
| | |||
* | Added benchmarks for type casting of float16 | 2016-02-26 | |
| | |||
* | Extended the tensor benchmark suite to support types other than floats | 2016-02-23 | |
| | |||
* | Updated the tensor benchmarking code to work with compilers that don't ↵ | 2016-02-23 | |
| | | | | support cxx11. | ||
* | Fixed clang related compilation error | 2016-01-28 | |
| | |||
* | Made sure the number of floating point operations done by a benchmark is ↵ | 2016-01-28 | |
| | | | | computed using 64 bit integers to avoid overflows. | ||
* | Updated the benchmarking code to print the number of flops processed instead ↵ | 2016-01-28 | |
| | | | | of the number of bytes. | ||
* | Added extra tensor benchmarks | 2016-01-28 | |
| | |||
* | bugfix | 2016-01-28 | |
| | |||
* | benchmark modifications to make it compilable in a standalone fashion. | 2016-01-28 | |
| | |||
* | Added a few benchmarks for the tensor code | 2015-01-26 | |