Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Eigen moved the `scanLauncehr` function inside the internal namespace. | 2020-05-11 | |
| | | | | | | | This commit applies the following changes: - Moving the `scamLauncher` specialization inside internal namespace to fix compiler crash on TensorScan for SYCL backend. - Replacing `SYCL/sycl.hpp` to `CL/sycl.hpp` in order to follow SYCL 1.2.1 standard. - minor fixes: commenting out an unused variable to avoid compiler warnings. | ||
* | [SYCL] Rebasing the SYCL support branch on top of the Einge upstream master ↵ | 2019-11-28 | |
| | | | | | | | | | | | | | | | | | | | | | | branch. * Unifying all loadLocalTile from lhs and rhs to an extract_block function. * Adding get_tensor operation which was missing in TensorContractionMapper. * Adding the -D method missing from cmake for Disable_Skinny Contraction operation. * Wrapping all the indices in TensorScanSycl into Scan parameter struct. * Fixing typo in Device SYCL * Unifying load to private register for tall/skinny no shared * Unifying load to vector tile for tensor-vector/vector-tensor operation * Removing all the LHS/RHS class for extracting data from global * Removing Outputfunction from TensorContractionSkinnyNoshared. * Combining the local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining General Tensor-Vector and VectorTensor contraction into one kernel. * Making double buffering optional for Tensor contraction when local memory is version is used. * Modifying benchmark to accept custom Reduction Sizes * Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host * Adding Test for SYCL * Modifying SYCL CMake | ||
* | Adding synchronisation to convolution kernel for sycl backend. | 2017-03-13 | |
| | |||
* | Fixing typo in sycl Benchmark. | 2017-03-08 | |
| | |||
* | Adding sycl Benchmarks. | 2017-03-08 | |
| | |||
* | Fixed the sycl benchmarking code | 2016-12-22 | |
| | |||
* | Partial OpenCL support via SYCL compatible with ComputeCpp CE. | 2016-09-19 | |
| | |||
* | Updated the README file for the tensor benchmarks | 2016-05-25 | |
| | |||
* | Improved the performance of tensor padding | 2016-05-25 | |
| | |||
* | Added benchmarks for contraction on CPU. | 2016-05-13 | |
| | |||
* | Added a benchmark to measure the performance of full reductions of 16 bit floats | 2016-05-05 | |
| | |||
* | Use index list for the striding benchmarks | 2016-04-21 | |
| | |||
* | Enable the benchmarks for algebraic and transcendental fnctions on fp16. | 2016-04-12 | |
| | |||
* | Turned on the contraction benchmarks for fp16 | 2016-04-12 | |
| | |||
* | Turn on the coeffWise benchmarks on fp16 | 2016-04-07 | |
| | |||
* | Fixed the type casting benchmarks for fp16 | 2016-04-07 | |
| | |||
* | Fixed the benchmarking of fp16 coefficient wise operations | 2016-04-07 | |
| | |||
* | Updated the benchmarking code to use Eigen::half instead of half | 2016-03-24 | |
| | |||
* | Made the tensor benchmarks compile on MacOS | 2016-03-23 | |
| | |||
* | Added benchmarks for full reduction | 2016-02-29 | |
| | |||
* | Improved the README | 2016-02-27 | |
| | |||
* | Added benchmarks for type casting of float16 | 2016-02-26 | |
| | |||
* | Added benchmarks for fp16 | 2016-02-26 | |
| | |||
* | Extended the tensor benchmark suite to support types other than floats | 2016-02-23 | |
| | |||
* | Updated the tensor benchmarking code to work with compilers that don't ↵ | 2016-02-23 | |
| | | | | support cxx11. | ||
* | Added 2 benchmarks to the suite of tensor benchmarks running on GPU | 2016-01-30 | |
| | |||
* | Fixed the tensor benchmarks on apple devices | 2016-01-28 | |
| | |||
* | Fixed clang related compilation error | 2016-01-28 | |
| | |||
* | Fixed a typo | 2016-01-28 | |
| | |||
* | Made sure the number of floating point operations done by a benchmark is ↵ | 2016-01-28 | |
| | | | | computed using 64 bit integers to avoid overflows. | ||
* | Added a readme to explain how to compile the tensor benchmarks. | 2016-01-28 | |
| | |||
* | Updated the benchmarking code to print the number of flops processed instead ↵ | 2016-01-28 | |
| | | | | of the number of bytes. | ||
* | Added extra tensor benchmarks | 2016-01-28 | |
| | |||
* | bugfix | 2016-01-28 | |
| | |||
* | benchmark modifications to make it compilable in a standalone fashion. | 2016-01-28 | |
| | |||
* | Added a few benchmarks for the tensor code | 2015-01-26 | |