Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | [SYCL] Rebasing the SYCL support branch on top of the Einge upstream master ↵ | Mehdi Goli | 2019-11-28 |
| | | | | | | | | | | | | | | | | | | | | | | branch. * Unifying all loadLocalTile from lhs and rhs to an extract_block function. * Adding get_tensor operation which was missing in TensorContractionMapper. * Adding the -D method missing from cmake for Disable_Skinny Contraction operation. * Wrapping all the indices in TensorScanSycl into Scan parameter struct. * Fixing typo in Device SYCL * Unifying load to private register for tall/skinny no shared * Unifying load to vector tile for tensor-vector/vector-tensor operation * Removing all the LHS/RHS class for extracting data from global * Removing Outputfunction from TensorContractionSkinnyNoshared. * Combining the local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining General Tensor-Vector and VectorTensor contraction into one kernel. * Making double buffering optional for Tensor contraction when local memory is version is used. * Modifying benchmark to accept custom Reduction Sizes * Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host * Adding Test for SYCL * Modifying SYCL CMake | ||
* | Adding synchronisation to convolution kernel for sycl backend. | Mehdi Goli | 2017-03-13 |
| | |||
* | Adding sycl Benchmarks. | Mehdi Goli | 2017-03-08 |
| | |||
* | Improved the performance of tensor padding | Benoit Steiner | 2016-05-25 |
| | |||
* | Added a benchmark to measure the performance of full reductions of 16 bit floats | Benoit Steiner | 2016-05-05 |
| | |||
* | Use index list for the striding benchmarks | Benoit Steiner | 2016-04-21 |
| | |||
* | Fixed the type casting benchmarks for fp16 | Benoit Steiner | 2016-04-07 |
| | |||
* | Fixed the benchmarking of fp16 coefficient wise operations | Benoit Steiner | 2016-04-07 |
| | |||
* | Made the tensor benchmarks compile on MacOS | Benoit Steiner | 2016-03-23 |
| | |||
* | Added benchmarks for full reduction | Benoit Steiner | 2016-02-29 |
| | |||
* | Added benchmarks for type casting of float16 | Benoit Steiner | 2016-02-26 |
| | |||
* | Extended the tensor benchmark suite to support types other than floats | Benoit Steiner | 2016-02-23 |
| | |||
* | Updated the tensor benchmarking code to work with compilers that don't ↵ | Benoit Steiner | 2016-02-23 |
| | | | | support cxx11. | ||
* | Fixed clang related compilation error | Benoit Steiner | 2016-01-28 |
| | |||
* | Made sure the number of floating point operations done by a benchmark is ↵ | Benoit Steiner | 2016-01-28 |
| | | | | computed using 64 bit integers to avoid overflows. | ||
* | Updated the benchmarking code to print the number of flops processed instead ↵ | Benoit Steiner | 2016-01-28 |
| | | | | of the number of bytes. | ||
* | Added extra tensor benchmarks | Benoit Steiner | 2016-01-28 |
| | |||
* | bugfix | Yangqing Jia | 2016-01-28 |
| | |||
* | benchmark modifications to make it compilable in a standalone fashion. | Yangqing Jia | 2016-01-28 |
| | |||
* | Added a few benchmarks for the tensor code | Benoit Steiner | 2015-01-26 |