aboutsummaryrefslogtreecommitdiffhomepage
path: root/unsupported/test
Commit message (Collapse)AuthorAge
* [SYCL] Rebasing the SYCL support branch on top of the Einge upstream master ↵Gravatar Mehdi Goli2019-11-28
| | | | | | | | | | | | | | | | | | | | | | branch. * Unifying all loadLocalTile from lhs and rhs to an extract_block function. * Adding get_tensor operation which was missing in TensorContractionMapper. * Adding the -D method missing from cmake for Disable_Skinny Contraction operation. * Wrapping all the indices in TensorScanSycl into Scan parameter struct. * Fixing typo in Device SYCL * Unifying load to private register for tall/skinny no shared * Unifying load to vector tile for tensor-vector/vector-tensor operation * Removing all the LHS/RHS class for extracting data from global * Removing Outputfunction from TensorContractionSkinnyNoshared. * Combining the local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining General Tensor-Vector and VectorTensor contraction into one kernel. * Making double buffering optional for Tensor contraction when local memory is version is used. * Modifying benchmark to accept custom Reduction Sizes * Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host * Adding Test for SYCL * Modifying SYCL CMake
* STYLE: Remove CMake-language block-end command argumentsGravatar Hans Johnson2019-10-31
| | | | | | Ancient versions of CMake required else(), endif(), and similar block termination commands to have arguments matching the command starting the block. This is no longer the preferred style.
* bug #1281: fix AutoDiffScalar's make_coherent for nested expression of ↵Gravatar Gael Guennebaud2019-11-14
| | | | constant ADs.
* Remove legacy block evaluation supportGravatar Eugene Zhulenev2019-11-12
|
* Fix data race in css11_tensor_notification test.Gravatar Rasmus Munk Larsen2019-11-08
|
* Add block evaluation V2 to TensorAsyncExecutor.Gravatar Rasmus Munk Larsen2019-10-22
| | | | Add async evaluation to a number of ops.
* Drop support for c++03 in Eigen tensor. Get rid of some code used to emulate ↵Gravatar Rasmus Munk Larsen2019-10-18
| | | | c++11 functionality with older compilers.
* Cleanup Tensor block destination and materialized block storage allocationGravatar Eugene Zhulenev2019-10-16
|
* TensorBroadcasting support for random/uniform blocksGravatar Eugene Zhulenev2019-10-16
|
* Block evaluation for TensorGenerator/TensorReverse/TensorShufflingGravatar Eugene Zhulenev2019-10-14
|
* Block evaluation for TensorGenerator + TensorReverse + fixed bug in tensor ↵Gravatar Eugene Zhulenev2019-10-10
| | | | reverse op
* Block evaluation for TensorChipping + fixed bugs in TensorPadding and ↵Gravatar Eugene Zhulenev2019-10-09
| | | | TensorSlicing
* Implement c++03 compatible fix for changeset ↵Gravatar Gael Guennebaud2019-10-09
| | | | 7a43af1a335da2c0489b4119a33ee1cbff0c15d6
* Fix compilation of FFTW unit testGravatar Gael Guennebaud2019-10-08
|
* Add block evaluation to TensorEvalTo and fix few small bugsGravatar Eugene Zhulenev2019-10-07
|
* Fix compilation warnings and errors with clang in TensorBlockV2 code and testsGravatar Eugene Zhulenev2019-10-04
|
* Add block evaluation to TensorReshaping/TensorCasting/TensorPadding/TensorSelectGravatar Eugene Zhulenev2019-10-02
|
* Fix cxx11_tensor_block_io testGravatar Eugene Zhulenev2019-09-25
|
* Fix compilation warnings and errors with clang in TensorBlockV2Gravatar Eugene Zhulenev2019-09-25
|
* Add new TensorBlock api implementation + testsGravatar Eugene Zhulenev2019-09-24
|
* Tensor block evaluation V2 support for unary/binary/broadcstingGravatar Eugene Zhulenev2019-09-24
|
* Add support for asynchronous evaluation of tensor casting expressions.Gravatar Rasmus Munk Larsen2019-09-19
|
* Merging eigen/eigen.Gravatar Srinivas Vasudevan2019-09-16
|\
* | Add Bessel functions to SpecialFunctions.Gravatar Srinivas Vasudevan2019-09-14
|/ | | | | | | | | - Split SpecialFunctions files in to a separate BesselFunctions file. In particular add: - Modified bessel functions of the second kind k0, k1, k0e, k1e - Bessel functions of the first kind j0, j1 - Bessel functions of the second kind y0, y1
* Fix for the HIP build+test errors introduced by the ndtri support.Gravatar Deven Desai2019-09-06
| | | | | | | The fixes needed are * adding EIGEN_DEVICE_FUNC attribute to a couple of funcs (else HIPCC will error out when non-device funcs are called from global/device funcs) * switching to using ::<math_func> instead std::<math_func> (only for HIPCC) in cases where the std::<math_func> is not recognized as a device func by HIPCC * removing an errant "j" from a testcase (don't know how that made it in to begin with!)
* Update ThreadLocal to use separate Initialize/Release callablesGravatar Eugene Zhulenev2019-09-10
|
* ThreadLocal container that does not rely on thread local storageGravatar Eugene Zhulenev2019-09-09
|
* PR 681: Add ndtri function, the inverse of the normal distribution function.Gravatar Srinivas Vasudevan2019-08-12
|
* Allow move-only done callback in TensorAsyncDeviceGravatar Eugene Zhulenev2019-09-03
|
* Add test for const TensorMap underlying data mutationGravatar Eugene Zhulenev2019-09-03
|
* evalSubExprsIfNeededAsync + async TensorContractionThreadPoolGravatar Eugene Zhulenev2019-08-30
|
* Asynchronous expression evaluation with TensorAsyncDeviceGravatar Eugene Zhulenev2019-08-30
|
* Const correctness in TensorMap<const Tensor<T, ...>> expressionsGravatar Eugene Zhulenev2019-08-28
|
* Remove XSMM support from Tensor moduleGravatar Eugene Zhulenev2019-08-19
|
* Disable tests for contraction with output kernels when using libxsmm, which ↵Gravatar Rasmus Munk Larsen2019-08-07
| | | | does not support this.
* Merge with Eigen headGravatar Eugene Zhulenev2019-06-28
|\
* | Add block access to TensorReverseOp and make sure that TensorForcedEval uses ↵Gravatar Eugene Zhulenev2019-06-28
| | | | | | | | block access when preferred
| * [SYCL] This PR adds the minimum modifications to the Eigen unsupported ↵Gravatar Mehdi Goli2019-06-28
|/ | | | | | | | | | module required to run it on devices supporting SYCL. * Abstracting the pointer type so that both SYCL memory and pointer can be captured. * Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class. * Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node. * Adding SYCL macro for controlling loop unrolling. * Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
* Minor build improvementsGravatar tra2019-05-31
| | | | | | | | * Allow specifying multiple GPU architectures. E.g.: cmake -DEIGEN_CUDA_COMPUTE_ARCH="60;70" * Pass CUDA SDK path to clang. Without it it will default to /usr/local/cuda which may not be the right location, if cmake was invoked with -DCUDA_TOOLKIT_ROOT_DIR=/some/other/CUDA/path
* Merged in rmlarsen/eigen_threadpool (pull request PR-640)Gravatar Rasmus Larsen2019-05-13
|\ | | | | | | | | | | Fix deadlocks in thread pool. Approved-by: Eugene Zhulenev <ezhulenev@google.com>
* | bug #1707: Fix deprecation warnings, or disable warnings when testing ↵Gravatar Christoph Hertzberg2019-05-10
| | | | | | | | deprecated functions
| * A) fix deadlocks in thread pool caused by EventCountGravatar Rasmus Munk Larsen2019-05-08
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixed 2 deadlocks caused by sloppiness in the EventCount logic. Both most likely were introduced by cl/236729920 which includes the new EventCount algorithm: https://github.com/eigenteam/eigen-git-mirror/commit/01da8caf003990967e42a2b9dc3869f154569538 bug #1 (Prewait): Prewait must not consume existing signals. Consider the following scenario. There are 2 thread pool threads (1 and 2) and 1 external thread (3). RunQueue is empty. Thread 1 checks the queue, calls Prewait, checks RunQueue again and now is going to call CommitWait. Thread 2 checks the queue and now is going to call Prewait. Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded). Now thread 2 resumes and calls Prewait and takes away the signal. Thread 1 resumes and calls CommitWait, there are no pending signals anymore, so it blocks. As the result we have 2 tasks, but only 1 thread is running. bug #2 (CancelWait): CancelWait must not take away a signal if it's not sure that the signal was meant for this thread. When one thread blocks and another submits a new task concurrently, the EventCount protocol guarantees only the following properties (similar to the Dekker's algorithm): (a) the registered waiter notices presence of the new task and does not block (b) the signaler notices presence of the waiters and wakes it (c) both the waiter notices presence of the new task and signaler notices presence of the waiter [it's only that both of them do not notice each other must not be possible, because it would lead to a deadlock] CancelWait is called for cases (a) and (c). For case (c) it is OK to take the notification signal away, but it's not OK for (a) because nobody queued a signals for us and we take away a signal meant for somebody else. Consider: Thread 1 calls Prewait, checks RunQueue, it's empty, now it's going to call CommitWait. Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded). Thread 2 calls Prewait, checks RunQueue, discovers the tasks, calls CancelWait and consumes the pending signal (meant for thread 1). Now Thread 1 resumes and calls CommitWait, since there are no signals it blocks. As the result we have 2 tasks, but only 1 thread is running. Both deadlocks are only a problem if the tasks require parallelism. Most computational tasks do not require parallelism, i.e. a single thread will run task 1, finish it and then dequeue and run task 2. This fix undoes some of the sloppiness in the EventCount that was meant to reduce CPU consumption by idle threads, because we now have more threads running in these corner cases. But we still don't have pthread_yield's and maybe the strictness introduced by this change will actually help to reduce tail latency because we will have threads running when we actually need them running. B) fix deadlock in thread pool caused by RunQueue This fixed a deadlock caused by sloppiness in the RunQueue logic. Most likely this was introduced with the non-blocking thread pool. The deadlock only affects workloads that require parallelism. Most computational tasks don't require parallelism. PopBack must not fail spuriously. If it does, it can effectively lead to single thread consuming several wake up signals. Consider 2 worker threads are blocked. External thread submits a task. One of the threads is woken. It tries to steal the task, but fails due to a spurious failure in PopBack (external thread submits another task and holds the lock). The thread executes blocking protocol again (it won't block because NonEmptyQueueIndex is precise and the thread will discover pending work, but it has called PrepareWait). Now external thread submits another task and signals EventCount again. The signal is consumed by the first thread again. But now we have 2 tasks pending but only 1 worker thread running. It may be possible to fix this in a different way: make EventCount::CancelWait forward wakeup signal to a blocked thread rather then consuming it. But this looks more complex and I am not 100% that it will fix the bug. It's also possible to have 2 versions of PopBack: one will do try_to_lock and another won't. Then worker threads could first opportunistically check all queues with try_to_lock, and only use the blocking version before blocking. But let's first fix the bug with the simpler change.
* Block evaluation for TensorGeneratorOpGravatar Eugene Zhulenev2019-03-05
|
* Do not create Tensor<const T> in cxx11_tensor_forced_eval testGravatar Eugene Zhulenev2019-03-05
|
* Add tiled evaluation for TensorForcedEvalOpGravatar Eugene Zhulenev2019-03-04
|
* Improve EventCount used by the non-blocking threadpool.Gravatar Rasmus Munk Larsen2019-02-22
| | | | | | | | | | The current algorithm requires threads to commit/cancel waiting in order they called Prewait. Spinning caused by that serialization can consume lots of CPU time on some workloads. Restructure the algorithm to not require that serialization and remove spin waits from Commit/CancelWait. Note: this reduces max number of threads from 2^16 to 2^14 to leave more space for ABA counter (which is now 22 bits). Implementation details are explained in comments.
* Avoid `I` as an identifier, since it may clash with the C-header complex.hGravatar Christoph Hertzberg2019-01-25
|
* Fix flaky test for tensor fft.Gravatar Rasmus Munk Larsen2019-01-16
|
* bug #1654: fix compilation with cuda and no c++11Gravatar Gael Guennebaud2019-01-09
|
* Various fixes in polynomial solver and its unit tests:Gravatar Gael Guennebaud2018-12-09
| | | | | | - cleanup noise in imaginary part of real roots - take into account the magnitude of the derivative to check roots. - use <= instead of < at appropriate places