aboutsummaryrefslogtreecommitdiffhomepage
path: root/unsupported/test
Commit message (Collapse)AuthorAge
...
* Add block evaluation V2 to TensorAsyncExecutor.Gravatar Rasmus Munk Larsen2019-10-22
| | | | Add async evaluation to a number of ops.
* Drop support for c++03 in Eigen tensor. Get rid of some code used to emulate ↵Gravatar Rasmus Munk Larsen2019-10-18
| | | | c++11 functionality with older compilers.
* Cleanup Tensor block destination and materialized block storage allocationGravatar Eugene Zhulenev2019-10-16
|
* TensorBroadcasting support for random/uniform blocksGravatar Eugene Zhulenev2019-10-16
|
* Block evaluation for TensorGenerator/TensorReverse/TensorShufflingGravatar Eugene Zhulenev2019-10-14
|
* Block evaluation for TensorGenerator + TensorReverse + fixed bug in tensor ↵Gravatar Eugene Zhulenev2019-10-10
| | | | reverse op
* Block evaluation for TensorChipping + fixed bugs in TensorPadding and ↵Gravatar Eugene Zhulenev2019-10-09
| | | | TensorSlicing
* Implement c++03 compatible fix for changeset ↵Gravatar Gael Guennebaud2019-10-09
| | | | 7a43af1a335da2c0489b4119a33ee1cbff0c15d6
* Fix compilation of FFTW unit testGravatar Gael Guennebaud2019-10-08
|
* Add block evaluation to TensorEvalTo and fix few small bugsGravatar Eugene Zhulenev2019-10-07
|
* Fix compilation warnings and errors with clang in TensorBlockV2 code and testsGravatar Eugene Zhulenev2019-10-04
|
* Add block evaluation to TensorReshaping/TensorCasting/TensorPadding/TensorSelectGravatar Eugene Zhulenev2019-10-02
|
* Fix cxx11_tensor_block_io testGravatar Eugene Zhulenev2019-09-25
|
* Fix compilation warnings and errors with clang in TensorBlockV2Gravatar Eugene Zhulenev2019-09-25
|
* Add new TensorBlock api implementation + testsGravatar Eugene Zhulenev2019-09-24
|
* Tensor block evaluation V2 support for unary/binary/broadcstingGravatar Eugene Zhulenev2019-09-24
|
* Add support for asynchronous evaluation of tensor casting expressions.Gravatar Rasmus Munk Larsen2019-09-19
|
* Merging eigen/eigen.Gravatar Srinivas Vasudevan2019-09-16
|\
* | Add Bessel functions to SpecialFunctions.Gravatar Srinivas Vasudevan2019-09-14
|/ | | | | | | | | - Split SpecialFunctions files in to a separate BesselFunctions file. In particular add: - Modified bessel functions of the second kind k0, k1, k0e, k1e - Bessel functions of the first kind j0, j1 - Bessel functions of the second kind y0, y1
* Fix for the HIP build+test errors introduced by the ndtri support.Gravatar Deven Desai2019-09-06
| | | | | | | The fixes needed are * adding EIGEN_DEVICE_FUNC attribute to a couple of funcs (else HIPCC will error out when non-device funcs are called from global/device funcs) * switching to using ::<math_func> instead std::<math_func> (only for HIPCC) in cases where the std::<math_func> is not recognized as a device func by HIPCC * removing an errant "j" from a testcase (don't know how that made it in to begin with!)
* Update ThreadLocal to use separate Initialize/Release callablesGravatar Eugene Zhulenev2019-09-10
|
* ThreadLocal container that does not rely on thread local storageGravatar Eugene Zhulenev2019-09-09
|
* PR 681: Add ndtri function, the inverse of the normal distribution function.Gravatar Srinivas Vasudevan2019-08-12
|
* Allow move-only done callback in TensorAsyncDeviceGravatar Eugene Zhulenev2019-09-03
|
* Add test for const TensorMap underlying data mutationGravatar Eugene Zhulenev2019-09-03
|
* evalSubExprsIfNeededAsync + async TensorContractionThreadPoolGravatar Eugene Zhulenev2019-08-30
|
* Asynchronous expression evaluation with TensorAsyncDeviceGravatar Eugene Zhulenev2019-08-30
|
* Const correctness in TensorMap<const Tensor<T, ...>> expressionsGravatar Eugene Zhulenev2019-08-28
|
* Remove XSMM support from Tensor moduleGravatar Eugene Zhulenev2019-08-19
|
* Disable tests for contraction with output kernels when using libxsmm, which ↵Gravatar Rasmus Munk Larsen2019-08-07
| | | | does not support this.
* Merge with Eigen headGravatar Eugene Zhulenev2019-06-28
|\
* | Add block access to TensorReverseOp and make sure that TensorForcedEval uses ↵Gravatar Eugene Zhulenev2019-06-28
| | | | | | | | block access when preferred
| * [SYCL] This PR adds the minimum modifications to the Eigen unsupported ↵Gravatar Mehdi Goli2019-06-28
|/ | | | | | | | | | module required to run it on devices supporting SYCL. * Abstracting the pointer type so that both SYCL memory and pointer can be captured. * Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class. * Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node. * Adding SYCL macro for controlling loop unrolling. * Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
* Minor build improvementsGravatar tra2019-05-31
| | | | | | | | * Allow specifying multiple GPU architectures. E.g.: cmake -DEIGEN_CUDA_COMPUTE_ARCH="60;70" * Pass CUDA SDK path to clang. Without it it will default to /usr/local/cuda which may not be the right location, if cmake was invoked with -DCUDA_TOOLKIT_ROOT_DIR=/some/other/CUDA/path
* Merged in rmlarsen/eigen_threadpool (pull request PR-640)Gravatar Rasmus Larsen2019-05-13
|\ | | | | | | | | | | Fix deadlocks in thread pool. Approved-by: Eugene Zhulenev <ezhulenev@google.com>
* | bug #1707: Fix deprecation warnings, or disable warnings when testing ↵Gravatar Christoph Hertzberg2019-05-10
| | | | | | | | deprecated functions
| * A) fix deadlocks in thread pool caused by EventCountGravatar Rasmus Munk Larsen2019-05-08
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixed 2 deadlocks caused by sloppiness in the EventCount logic. Both most likely were introduced by cl/236729920 which includes the new EventCount algorithm: https://github.com/eigenteam/eigen-git-mirror/commit/01da8caf003990967e42a2b9dc3869f154569538 bug #1 (Prewait): Prewait must not consume existing signals. Consider the following scenario. There are 2 thread pool threads (1 and 2) and 1 external thread (3). RunQueue is empty. Thread 1 checks the queue, calls Prewait, checks RunQueue again and now is going to call CommitWait. Thread 2 checks the queue and now is going to call Prewait. Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded). Now thread 2 resumes and calls Prewait and takes away the signal. Thread 1 resumes and calls CommitWait, there are no pending signals anymore, so it blocks. As the result we have 2 tasks, but only 1 thread is running. bug #2 (CancelWait): CancelWait must not take away a signal if it's not sure that the signal was meant for this thread. When one thread blocks and another submits a new task concurrently, the EventCount protocol guarantees only the following properties (similar to the Dekker's algorithm): (a) the registered waiter notices presence of the new task and does not block (b) the signaler notices presence of the waiters and wakes it (c) both the waiter notices presence of the new task and signaler notices presence of the waiter [it's only that both of them do not notice each other must not be possible, because it would lead to a deadlock] CancelWait is called for cases (a) and (c). For case (c) it is OK to take the notification signal away, but it's not OK for (a) because nobody queued a signals for us and we take away a signal meant for somebody else. Consider: Thread 1 calls Prewait, checks RunQueue, it's empty, now it's going to call CommitWait. Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded). Thread 2 calls Prewait, checks RunQueue, discovers the tasks, calls CancelWait and consumes the pending signal (meant for thread 1). Now Thread 1 resumes and calls CommitWait, since there are no signals it blocks. As the result we have 2 tasks, but only 1 thread is running. Both deadlocks are only a problem if the tasks require parallelism. Most computational tasks do not require parallelism, i.e. a single thread will run task 1, finish it and then dequeue and run task 2. This fix undoes some of the sloppiness in the EventCount that was meant to reduce CPU consumption by idle threads, because we now have more threads running in these corner cases. But we still don't have pthread_yield's and maybe the strictness introduced by this change will actually help to reduce tail latency because we will have threads running when we actually need them running. B) fix deadlock in thread pool caused by RunQueue This fixed a deadlock caused by sloppiness in the RunQueue logic. Most likely this was introduced with the non-blocking thread pool. The deadlock only affects workloads that require parallelism. Most computational tasks don't require parallelism. PopBack must not fail spuriously. If it does, it can effectively lead to single thread consuming several wake up signals. Consider 2 worker threads are blocked. External thread submits a task. One of the threads is woken. It tries to steal the task, but fails due to a spurious failure in PopBack (external thread submits another task and holds the lock). The thread executes blocking protocol again (it won't block because NonEmptyQueueIndex is precise and the thread will discover pending work, but it has called PrepareWait). Now external thread submits another task and signals EventCount again. The signal is consumed by the first thread again. But now we have 2 tasks pending but only 1 worker thread running. It may be possible to fix this in a different way: make EventCount::CancelWait forward wakeup signal to a blocked thread rather then consuming it. But this looks more complex and I am not 100% that it will fix the bug. It's also possible to have 2 versions of PopBack: one will do try_to_lock and another won't. Then worker threads could first opportunistically check all queues with try_to_lock, and only use the blocking version before blocking. But let's first fix the bug with the simpler change.
* Block evaluation for TensorGeneratorOpGravatar Eugene Zhulenev2019-03-05
|
* Do not create Tensor<const T> in cxx11_tensor_forced_eval testGravatar Eugene Zhulenev2019-03-05
|
* Add tiled evaluation for TensorForcedEvalOpGravatar Eugene Zhulenev2019-03-04
|
* Improve EventCount used by the non-blocking threadpool.Gravatar Rasmus Munk Larsen2019-02-22
| | | | | | | | | | The current algorithm requires threads to commit/cancel waiting in order they called Prewait. Spinning caused by that serialization can consume lots of CPU time on some workloads. Restructure the algorithm to not require that serialization and remove spin waits from Commit/CancelWait. Note: this reduces max number of threads from 2^16 to 2^14 to leave more space for ABA counter (which is now 22 bits). Implementation details are explained in comments.
* Avoid `I` as an identifier, since it may clash with the C-header complex.hGravatar Christoph Hertzberg2019-01-25
|
* Fix flaky test for tensor fft.Gravatar Rasmus Munk Larsen2019-01-16
|
* bug #1654: fix compilation with cuda and no c++11Gravatar Gael Guennebaud2019-01-09
|
* Various fixes in polynomial solver and its unit tests:Gravatar Gael Guennebaud2018-12-09
| | | | | | - cleanup noise in imaginary part of real roots - take into account the magnitude of the derivative to check roots. - use <= instead of < at appropriate places
* Fixed most conversion warnings in MatrixFunctions moduleGravatar Christoph Hertzberg2018-11-20
|
* [PATCH 1/2] Misc. typosGravatar luz.paz"2018-09-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001 Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of: ``` als ans cas dum lastr lowd nd overfl pres preverse substraction te uint whch ``` --- CMakeLists.txt | 26 +++++++++---------- Eigen/src/Core/GenericPacketMath.h | 2 +- Eigen/src/SparseLU/SparseLU.h | 2 +- bench/bench_norm.cpp | 2 +- doc/HiPerformance.dox | 2 +- doc/QuickStartGuide.dox | 2 +- .../Eigen/CXX11/src/Tensor/TensorChipping.h | 6 ++--- .../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h | 2 +- .../src/Tensor/TensorForwardDeclarations.h | 4 +-- .../src/Tensor/TensorGpuHipCudaDefines.h | 2 +- .../Eigen/CXX11/src/Tensor/TensorReduction.h | 2 +- .../CXX11/src/Tensor/TensorReductionGpu.h | 2 +- .../test/cxx11_tensor_concatenation.cpp | 2 +- unsupported/test/cxx11_tensor_executor.cpp | 2 +- 14 files changed, 29 insertions(+), 29 deletions(-)
* Merged in ezhulenev/eigen-02 (pull request PR-534)Gravatar Rasmus Munk Larsen2018-10-25
|\ | | | | | | Fix cxx11_tensor_{block_access, reduction} tests
| * Fix cxx11_tensor_{block_access, reduction} testsGravatar Eugene Zhulenev2018-10-25
| |
* | bug #1606: Explicitly set the standard before ↵Gravatar Christoph Hertzberg2018-10-19
|/ | | | | | | find_package(StandardMathLibrary). Also replace EIGEN_COMPILER_SUPPORT_CXX11 in favor of EIGEN_COMPILER_SUPPORT_CPP11. Grafted manually from a4afa90d161faab385a77f0e2764fb13ff3b9484