Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Merged in a-doumoulakis/opencl (pull request PR-12) | 2017-05-25 | |
|\ | | | | | | | Enable triSYCL with Eigen | ||
| * | Modification upon request | 2017-05-25 | |
| | | | | | | | | - Remove warning suppression | ||
* | | Merged in mehdi_goli/opencl/CmakeFixForUbuntu16.04 (pull request PR-11) | 2017-05-24 | |
|\ \ | | | | | | | | | | CmakeFixForUbuntu16.04 | ||
| | * | Restore misplaced comment | 2017-05-24 | |
| | | | |||
| | * | Merge changed from upstream | 2017-05-24 | |
| | |\ | |_|/ |/| | | |||
| * | | Merged in DuncanMcBain/opencl/default (pull request PR-2) | 2017-05-24 | |
| |\ \ | | | | | | | | | | | | | Update FindComputeCpp.cmake with new changes from SDK | ||
| * | | | Fixing Cmake for gcc>=5. | 2017-05-24 | |
| | | | | |||
| | * | | Update FindComputeCpp.cmake with new changes from SDK | 2017-05-24 | |
| |/ / |/| | | |||
| * | | Merge with Benoit. | 2017-05-23 | |
| |\ \ | |/ / |/| | | |||
| * | | Temporarry branch for synch with upstream | 2017-05-23 | |
| | | | |||
* | | | Merged in mehdi_goli/opencl/FixingCmakeDependency (pull request PR-2) | 2017-05-22 | |
|\ \ \ | | | | | | | | | | | | | Fixing Cmake Dependency for SYCL | ||
* \ \ \ | Merged in mehdi_goli/opencl/TensorSupportedDevice (pull request PR-6) | 2017-05-22 | |
|\ \ \ \ | |_|/ / |/| | | | | | | | Fixing suported device list. | ||
| * | | | Fixing suported device list. | 2017-05-22 | |
|/ / / | |||
| * / | Fixing Cmake Dependency for SYCL | 2017-05-22 | |
|/ / | |||
| * | Add cmake file FindTriSYCL.cmake | 2017-05-17 | |
| | | |||
| * | Add support for triSYCL | 2017-05-05 | |
|/ | | | | | | Eigen is now able to use triSYCL with EIGEN_SYCL_TRISYCL and TRISYCL_INCLUDE_DIR options Fix contraction kernel with correct nd_item dimension | ||
* | Merged in benoitsteiner/opencl (pull request PR-309) | 2017-04-05 | |
|\ | | | | | | | OpenCL improvements | ||
| * | Preserve file naming conventions | 2017-04-04 | |
| | | |||
| * | Deleted empty line of code | 2017-04-04 | |
| | | |||
| * | Guard sycl specific code under a EIGEN_USE_SYCL ifdef | 2017-04-04 | |
| | | |||
| * | Code cleanup | 2017-04-04 | |
| | | |||
| * | Guard the sycl specific code with EIGEN_USE_SYCL | 2017-04-04 | |
| | | |||
| * | Guard the sycl specific code with a #ifdef EIGEN_USE_SYCL | 2017-04-04 | |
| | | |||
| * | iGate the sycl specific code under a EIGEN_USE_SYCL define | 2017-04-04 | |
| | | |||
| * | Fixed compilation error when sycl is enabled. | 2017-04-04 | |
| | | |||
* | | fix typos in the Tensor readme | 2017-03-31 | |
| | | |||
| * | Restored code compatibility with compilers that dont support c++11 | 2017-03-31 | |
| | | | | | | | | Gated more sycl code under #ifdef sycl | ||
| * | Restore the old constructors to retain compatibility with non c++11 compilers. | 2017-03-31 | |
| | | |||
| * | Gate the sycl specific code under #ifdef sycl | 2017-03-31 | |
| | | |||
| * | Fixing TensorArgMaxSycl.h; Removing warning related to the hardcoded type of ↵ | 2017-03-28 | |
| | | | | | | | | dims to be int in Argmax. | ||
| * | Introduces align allocator for SYCL buffer | 2017-03-20 | |
| | | |||
* | | update has_ReturnType to be more consistent with other has_ helpers | 2017-03-17 | |
| | | |||
| * | Merged eigen/eigen into default | 2017-03-15 | |
| |\ | |/ |/| | |||
* | | Silenced compilation warning | 2017-03-15 | |
| | | |||
| * | Temporary: Disables cxx11_tensor_argmax_sycl test since it is causing zombie ↵ | 2017-03-15 | |
| | | | | | | | | thread | ||
| * | Fixes bug in get_sycl_supported_devices() that was reporting unsupported ↵ | 2017-03-15 | |
| | | | | | | | | Intel CPU on AMD platform - causing timeouts in that configuration | ||
* | | Merged in ilya-biryukov/eigen/fix_clang_cuda_compilation (pull request PR-304) | 2017-03-15 | |
|\ \ | | | | | | | | | | Fixed compilation with cuda-clang | ||
* | | | better check array index before using it | 2017-03-15 | |
| | | | |||
* | | | ARM prefetch fixes: Implement prefetch on ARM64. Do not clobber cc on ARM32. | 2017-03-15 | |
| | | | |||
| | * | Adding synchronisation to convolution kernel for sycl backend. | 2017-03-13 | |
| | | | |||
| | * | Use name to distinguish name instead of the vendor | 2017-03-08 | |
| | | | |||
| | * | Fixing typo in sycl Benchmark. | 2017-03-08 | |
| | | | |||
| | * | Adding sycl Benchmarks. | 2017-03-08 | |
| | | | |||
| | * | Fixing potential race condition on sycl device. | 2017-03-07 | |
| | | | |||
| | * | Adding TensorIndexTuple and TensorTupleReduceOP backend (ArgMax/Min) for ↵ | 2017-03-07 | |
| | | | | | | | | | | | | sycl; fixing the address space issue for const TensorMap; converting all discard_write to write due to data missmatch. | ||
| * | | Fixed compilation with cuda-clang | 2017-03-06 | |
|/ / | |||
* | | Made the reduction code compile with cuda-clang | 2017-03-14 | |
| | | |||
* | | Get rid of Init(). | 2017-03-10 | |
| | | |||
* | | Use C++11 ctor forwarding to simplify code a bit. | 2017-03-10 | |
| | | |||
* | | Make the non-blocking threadpool more flexible and less wasteful of CPU ↵ | 2017-03-09 | |
| | | | | | | | | | | | | | | | | | | | | | | | | cycles for high-latency use-cases. * Adds a hint to ThreadPool allowing us to turn off spin waiting. Currently each reader and record yielder op in a graph creates a threadpool with a thread that spins for 1000 iterations through the work stealing loop before yielding. This is wasteful for such ops that process I/O. * This also changes the number of iterations through the steal loop to be inversely proportional to the number of threads. Since the time of each iteration is proportional to the number of threads, this yields roughly a constant spin time. * Implement a separate worker loop for the num_threads == 1 case since there is no point in going through the expensive steal loop. Moreover, since Steal() calls PopBack() on the victim queues it might reverse the order in which ops are executed, compared to the order in which they are scheduled, which is usually counter-productive for the types of I/O workloads the single thread pools tend to be used for. * Store num_threads in a member variable for simplicity and to avoid a data race between the thread creation loop and worker threads calling threads_.size(). |