Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Restored code compatibility with compilers that dont support c++11 | 2017-03-31 | |
| | | | | Gated more sycl code under #ifdef sycl | ||
* | Restore the old constructors to retain compatibility with non c++11 compilers. | 2017-03-31 | |
| | |||
* | Gate the sycl specific code under #ifdef sycl | 2017-03-31 | |
| | |||
* | Fixing TensorArgMaxSycl.h; Removing warning related to the hardcoded type of ↵ | 2017-03-28 | |
| | | | | dims to be int in Argmax. | ||
* | Introduces align allocator for SYCL buffer | 2017-03-20 | |
| | |||
* | Merged eigen/eigen into default | 2017-03-15 | |
|\ | |||
| * | Silenced compilation warning | 2017-03-15 | |
| | | |||
* | | Temporary: Disables cxx11_tensor_argmax_sycl test since it is causing zombie ↵ | 2017-03-15 | |
| | | | | | | | | thread | ||
* | | Fixes bug in get_sycl_supported_devices() that was reporting unsupported ↵ | 2017-03-15 | |
| | | | | | | | | Intel CPU on AMD platform - causing timeouts in that configuration | ||
| * | Merged in ilya-biryukov/eigen/fix_clang_cuda_compilation (pull request PR-304) | 2017-03-15 | |
| |\ | | | | | | | | | | Fixed compilation with cuda-clang | ||
| * | | better check array index before using it | 2017-03-15 | |
| | | | |||
| * | | ARM prefetch fixes: Implement prefetch on ARM64. Do not clobber cc on ARM32. | 2017-03-15 | |
| | | | |||
* | | | Adding synchronisation to convolution kernel for sycl backend. | 2017-03-13 | |
| | | | |||
* | | | Use name to distinguish name instead of the vendor | 2017-03-08 | |
| | | | |||
* | | | Fixing typo in sycl Benchmark. | 2017-03-08 | |
| | | | |||
* | | | Adding sycl Benchmarks. | 2017-03-08 | |
| | | | |||
* | | | Fixing potential race condition on sycl device. | 2017-03-07 | |
| | | | |||
* | | | Adding TensorIndexTuple and TensorTupleReduceOP backend (ArgMax/Min) for ↵ | 2017-03-07 | |
| | | | | | | | | | | | | sycl; fixing the address space issue for const TensorMap; converting all discard_write to write due to data missmatch. | ||
| | * | Fixed compilation with cuda-clang | 2017-03-06 | |
| |/ | |||
| * | Made the reduction code compile with cuda-clang | 2017-03-14 | |
| | | |||
| * | Get rid of Init(). | 2017-03-10 | |
| | | |||
| * | Use C++11 ctor forwarding to simplify code a bit. | 2017-03-10 | |
| | | |||
| * | Make the non-blocking threadpool more flexible and less wasteful of CPU ↵ | 2017-03-09 | |
| | | | | | | | | | | | | | | | | | | | | | | | | cycles for high-latency use-cases. * Adds a hint to ThreadPool allowing us to turn off spin waiting. Currently each reader and record yielder op in a graph creates a threadpool with a thread that spins for 1000 iterations through the work stealing loop before yielding. This is wasteful for such ops that process I/O. * This also changes the number of iterations through the steal loop to be inversely proportional to the number of threads. Since the time of each iteration is proportional to the number of threads, this yields roughly a constant spin time. * Implement a separate worker loop for the num_threads == 1 case since there is no point in going through the expensive steal loop. Moreover, since Steal() calls PopBack() on the victim queues it might reverse the order in which ops are executed, compared to the order in which they are scheduled, which is usually counter-productive for the types of I/O workloads the single thread pools tend to be used for. * Store num_threads in a member variable for simplicity and to avoid a data race between the thread creation loop and worker threads calling threads_.size(). | ||
| * | bug #1401: fix compilation of "cond ? x : -x" with x an AutoDiffScalar | 2017-03-08 | |
| | | |||
| * | fix typo | 2017-03-07 | |
| | | |||
| * | remove UTF8 symbol | 2017-03-07 | |
| | | |||
| * | remove UTF8 symbols | 2017-03-07 | |
| | | |||
| * | do not include std header within extern C | 2017-03-07 | |
| | | |||
| * | bug #1400: fix stableNorm with EIGEN_DONT_ALIGN_STATICALLY | 2017-03-07 | |
| | | |||
| * | Made the Tensor code compile with clang 3.9 | 2017-03-02 | |
| | | |||
| * | Adjusted the EIGEN_DEVICE_FUNC qualifiers to make sure that: | 2017-03-01 | |
| | | | | | | | | | | * they're used consistently between the declaration and the definition of a function * we avoid calling host only methods from host device methods. | ||
| * | Silenced a couple of compilation warnings | 2017-03-01 | |
| | | |||
| * | Added missing EIGEN_DEVICE_FUNC qualifiers | 2017-03-01 | |
| | | |||
| * | Added missing EIGEN_DEVICE_FUNC qualifiers | 2017-02-28 | |
| | | |||
| * | Made most of the packet math primitives usable within CUDA kernel when ↵ | 2017-02-28 | |
| | | | | | | | | compiling with clang | ||
| * | Silenced clang compilation warning. | 2017-02-28 | |
| | | |||
| * | Added missing EIGEN_DEVICE_FUNC qualifiers | 2017-02-28 | |
| | | |||
| * | Added missing EIGEN_DEVICE_FUNC qualifiers | 2017-02-28 | |
| | | |||
| * | Added missing EIGEN_DEVICE_FUNC | 2017-02-28 | |
| | | |||
| * | Made the TensorStorage class compile with clang 3.9 | 2017-02-28 | |
| | | |||
| * | Deleted extra: EIGEN_DEVICE_FUNC: the QR and Cholesky code isn't ready to ↵ | 2017-02-28 | |
| | | | | | | | | run on GPU yet. | ||
| * | Added missing EIGEN_DEVICE_FUNC qualifiers | 2017-02-28 | |
| | | |||
| * | Added missing EIGEN_DEVICE_FUNC qualifiers | 2017-02-28 | |
| | | |||
| * | Added missing EIGEN_DEVICE_FUNC qualifiers | 2017-02-28 | |
| | | |||
* | | Adding sycl backend for TensorCustomOp; fixing the partial lhs modification ↵ | 2017-02-28 | |
| | | | | | | | | issue on sycl when the rhs is TensorContraction, reduction or convolution; Fixing the partial modification for memset when sycl backend is used. | ||
| * | bug #1396: add some missing EIGEN_DEVICE_FUNC | 2017-02-28 | |
| | | |||
| * | Fix typo. | 2017-02-28 | |
| | | |||
| * | Added missing EIGEN_DEVICE_FUNC to the SelfCwise binary ops | 2017-02-27 | |
| | | |||
| * | Added missing EIGEN_DEVICE_FUNC qualifiers to several nullary op methods. | 2017-02-27 | |
| | | |||
| * | Declared the plset, ploadt_ro, and ploaddup packet primitives as usable ↵ | 2017-02-27 | |
| | | | | | | | | within a gpu kernel |