aboutsummaryrefslogtreecommitdiffhomepage
Commit message (Collapse)AuthorAge
...
* Merge with Eigen headGravatar Eugene Zhulenev2019-06-28
|\
* | Add block access to TensorReverseOp and make sure that TensorForcedEval uses ↵Gravatar Eugene Zhulenev2019-06-28
| | | | | | | | block access when preferred
| * [SYCL] This PR adds the minimum modifications to the Eigen unsupported ↵Gravatar Rasmus Munk Larsen2019-06-28
|/| | | | | | | | | | | | | | | | | | | module required to run it on devices supporting SYCL. * Abstracting the pointer type so that both SYCL memory and pointer can be captured. * Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class. * Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node. * Adding SYCL macro for controlling loop unrolling. * Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
| * [SYCL] This PR adds the minimum modifications to the Eigen unsupported ↵Gravatar Mehdi Goli2019-06-28
|/ | | | | | | | | | module required to run it on devices supporting SYCL. * Abstracting the pointer type so that both SYCL memory and pointer can be captured. * Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class. * Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node. * Adding SYCL macro for controlling loop unrolling. * Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
* [SYCL] This PR adds the minimum modifications to Eigen core required to run ↵Gravatar Mehdi Goli2019-06-27
| | | | | | | | Eigen unsupported modules on devices supporting SYCL. * Adding SYCL memory model * Enabling/Disabling SYCL backend in Core * Supporting Vectorization
* Remove extra comma (causes warnings in C++03)Gravatar Christoph Hertzberg2019-06-26
|
* Optimize evaluation strategy for TensorSlicingOp and TensorChippingOpGravatar Eugene Zhulenev2019-06-25
|
* fix for a ROCm/HIP specificcompile errror introduced by a recent commit.Gravatar Deven Desai2019-06-22
|
* Remove extra "one" in comment.Gravatar Rasmus Munk Larsen2019-06-20
|
* Update comment as suggested by tra@google.com.Gravatar Rasmus Munk Larsen2019-06-20
|
* Fix grammar.Gravatar Rasmus Munk Larsen2019-06-20
|
* Added comment explaining the surprising EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC ↵Gravatar Rasmus Munk Larsen2019-06-20
| | | | clause.
* Fix CUDA build on Mac.Gravatar Rasmus Munk Larsen2019-06-20
|
* Various fixes for packet ops.Gravatar Rasmus Munk Larsen2019-06-20
| | | | | | 1. Fix buggy pcmp_eq and unit test for half types. 2. Add unit test for pselect and add specializations for SSE 4.1, AVX512, and half types. 3. Get rid of FIXME: Implement faster pnegate for half by XOR'ing with a sign bit mask.
* bug #1724: Mask buggy warnings with g++-7Gravatar Christoph Hertzberg2019-06-14
| | | | | (grafted from 427f2f66d69ae9b124c2f8bcd927fb6e19e07e91 )
* Make is_valid_index_type return false for float and double when ↵Gravatar Rasmus Munk Larsen2019-06-05
| | | | EIGEN_HAS_TYPE_TRAITS is off.
* Add workaround for choosing the right include files with FP16C support with ↵Gravatar Rasmus Munk Larsen2019-06-05
| | | | clang.
* Merged in Artem-B/eigen (pull request PR-654)Gravatar Rasmus Larsen2019-05-31
|\ | | | | | | | | | | Minor build improvements Approved-by: Rasmus Larsen <rmlarsen@google.com>
* | Clean up CUDA/NVCC version macros and their use in Eigen, and a few other ↵Gravatar Rasmus Munk Larsen2019-05-31
| | | | | | | | CUDA build failures.
| * Minor build improvementsGravatar tra2019-05-31
|/ | | | | | | | * Allow specifying multiple GPU architectures. E.g.: cmake -DEIGEN_CUDA_COMPUTE_ARCH="60;70" * Pass CUDA SDK path to clang. Without it it will default to /usr/local/cuda which may not be the right location, if cmake was invoked with -DCUDA_TOOLKIT_ROOT_DIR=/some/other/CUDA/path
* digits10() needs to return an integerGravatar Christoph Hertzberg2019-05-31
| | | | Problem reported on https://stackoverflow.com/questions/56395899
* Merged in deven-amd/eigen-hip-fix-190524 (pull request PR-649)Gravatar Rasmus Larsen2019-05-24
|\ | | | | | | fix for HIP build errors that were introduced by a commit earlier this week
| * fix for HIP build errors that were introduced by a commit earlier this weekGravatar Deven Desai2019-05-24
| |
* | Use pade for matrix exponential also for complex values.Gravatar Michael Tesch2019-05-08
|/
* GEMV: remove double declaration of constant.Gravatar Gustavo Lima Chaves2019-05-23
| | | | | | | | | | | | | That was hurting users with compilers that would object to proceed with that: """ ./Eigen/src/Core/products/GeneralMatrixVector.h:356:10: error: declaration shadows a static data member of 'general_matrix_vector_product<type-parameter-0-0, type-parameter-0-1, type-parameter-0-2, 1, ConjugateLhs, type-parameter-0-4, type-parameter-0-5, ConjugateRhs, Version>' [-Werror,-Wshadow] LhsPacketSize = Traits::LhsPacketSize, ^ ./Eigen/src/Core/products/GeneralMatrixVector.h:307:22: note: previous declaration is here static const Index LhsPacketSize = Traits::LhsPacketSize; """
* Cast Index to RealScalarGravatar Christoph Hertzberg2019-05-23
| | | | | This fixes compilation issues with RealScalar types that are not implicitly castable from Index (e.g. ceres Jet types). Reported by Peter Anderson-Sprecher via eMail
* Enable support for F16C with Clang. The required intrinsics were added here: ↵Gravatar Rasmus Munk Larsen2019-05-20
| | | | | | https://reviews.llvm.org/D16177 and are part of LLVM 3.8.0.
* Merged in rmlarsen/eigen (pull request PR-643)Gravatar Rasmus Larsen2019-05-20
|\ | | | | | | | | | | Make Eigen build with cuda 10 and clang. Approved-by: Justin Lebar <justin.lebar@gmail.com>
| * MergeGravatar Rasmus Munk Larsen2019-05-20
| |\
* | \ Merged in scramsby/eigen (pull request PR-646)Gravatar Gael Guennebaud2019-05-20
|\ \ \ | | | | | | | | | | | | Eigen: Fix MSVC C++17 language standard detection logic
* | | | Prevent potential division by zero in TensorExecutorGravatar Eugene Zhulenev2019-05-17
| | | |
* | | | Merged in ezhulenev/eigen-01 (pull request PR-644)Gravatar Rasmus Larsen2019-05-17
|\ \ \ \ | | | | | | | | | | | | | | | Always evaluate Tensor expressions with broadcasting via tiled evaluation code path
* \ \ \ \ Merged in glchaves/eigen (pull request PR-635)Gravatar Rasmus Larsen2019-05-17
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Speed up GEMV on AVX-512 builds, just as done for GEBP previously. Approved-by: Rasmus Larsen <rmlarsen@google.com>
| | * | | | Always evaluate Tensor expressions with broadcasting via tiled evaluation ↵Gravatar Eugene Zhulenev2019-05-16
| |/ / / / |/| | | | | | | | | | | | | | code path
| | | * | Make Eigen build with cuda 10 and clang.Gravatar Rasmus Munk Larsen2019-05-15
| |_|/ / |/| | |
| | | * Make Eigen build with cuda 10 and clang.Gravatar Rasmus Munk Larsen2019-05-15
| |_|/ |/| |
* | | Merged in rmlarsen/eigen_threadpool (pull request PR-640)Gravatar Rasmus Larsen2019-05-13
|\ \ \ | | | | | | | | | | | | | | | | | | | | Fix deadlocks in thread pool. Approved-by: Eugene Zhulenev <ezhulenev@google.com>
* | | | Collapsed revision from PR-641Gravatar Christoph Hertzberg2019-05-13
| | | | | | | | | | | | | | | | | | | | * SparseLU.h - corrected example, it didn't compile * Changed encoding back to UTF8
* | | | Removing unused API to fix compile error in TensorFlow due toGravatar Anuj Rawat2019-05-12
| | | | | | | | | | | | | | | | AVX512VL, AVX512BW usage
* | | | bug #1707: Fix deprecation warnings, or disable warnings when testing ↵Gravatar Christoph Hertzberg2019-05-10
| | | | | | | | | | | | | | | | deprecated functions
* | | | Fix build with clang on Windows.Gravatar Rasmus Munk Larsen2019-05-09
| | | |
| * | | A) fix deadlocks in thread pool caused by EventCountGravatar Rasmus Munk Larsen2019-05-08
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixed 2 deadlocks caused by sloppiness in the EventCount logic. Both most likely were introduced by cl/236729920 which includes the new EventCount algorithm: https://github.com/eigenteam/eigen-git-mirror/commit/01da8caf003990967e42a2b9dc3869f154569538 bug #1 (Prewait): Prewait must not consume existing signals. Consider the following scenario. There are 2 thread pool threads (1 and 2) and 1 external thread (3). RunQueue is empty. Thread 1 checks the queue, calls Prewait, checks RunQueue again and now is going to call CommitWait. Thread 2 checks the queue and now is going to call Prewait. Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded). Now thread 2 resumes and calls Prewait and takes away the signal. Thread 1 resumes and calls CommitWait, there are no pending signals anymore, so it blocks. As the result we have 2 tasks, but only 1 thread is running. bug #2 (CancelWait): CancelWait must not take away a signal if it's not sure that the signal was meant for this thread. When one thread blocks and another submits a new task concurrently, the EventCount protocol guarantees only the following properties (similar to the Dekker's algorithm): (a) the registered waiter notices presence of the new task and does not block (b) the signaler notices presence of the waiters and wakes it (c) both the waiter notices presence of the new task and signaler notices presence of the waiter [it's only that both of them do not notice each other must not be possible, because it would lead to a deadlock] CancelWait is called for cases (a) and (c). For case (c) it is OK to take the notification signal away, but it's not OK for (a) because nobody queued a signals for us and we take away a signal meant for somebody else. Consider: Thread 1 calls Prewait, checks RunQueue, it's empty, now it's going to call CommitWait. Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded). Thread 2 calls Prewait, checks RunQueue, discovers the tasks, calls CancelWait and consumes the pending signal (meant for thread 1). Now Thread 1 resumes and calls CommitWait, since there are no signals it blocks. As the result we have 2 tasks, but only 1 thread is running. Both deadlocks are only a problem if the tasks require parallelism. Most computational tasks do not require parallelism, i.e. a single thread will run task 1, finish it and then dequeue and run task 2. This fix undoes some of the sloppiness in the EventCount that was meant to reduce CPU consumption by idle threads, because we now have more threads running in these corner cases. But we still don't have pthread_yield's and maybe the strictness introduced by this change will actually help to reduce tail latency because we will have threads running when we actually need them running. B) fix deadlock in thread pool caused by RunQueue This fixed a deadlock caused by sloppiness in the RunQueue logic. Most likely this was introduced with the non-blocking thread pool. The deadlock only affects workloads that require parallelism. Most computational tasks don't require parallelism. PopBack must not fail spuriously. If it does, it can effectively lead to single thread consuming several wake up signals. Consider 2 worker threads are blocked. External thread submits a task. One of the threads is woken. It tries to steal the task, but fails due to a spurious failure in PopBack (external thread submits another task and holds the lock). The thread executes blocking protocol again (it won't block because NonEmptyQueueIndex is precise and the thread will discover pending work, but it has called PrepareWait). Now external thread submits another task and signals EventCount again. The signal is consumed by the first thread again. But now we have 2 tasks pending but only 1 worker thread running. It may be possible to fix this in a different way: make EventCount::CancelWait forward wakeup signal to a blocked thread rather then consuming it. But this looks more complex and I am not 100% that it will fix the bug. It's also possible to have 2 versions of PopBack: one will do try_to_lock and another won't. Then worker threads could first opportunistically check all queues with try_to_lock, and only use the blocking version before blocking. But let's first fix the bug with the simpler change.
* | | Fix AVX512 & GCC 6.3 compilationGravatar Eugene Zhulenev2019-05-07
| | |
* | | Fix stupid shadow-warnings (with old clang versions)Gravatar Christoph Hertzberg2019-05-07
| | |
* | | Restore C++03 compatibilityGravatar Christoph Hertzberg2019-05-07
| | |
* | | Restore C++03 compatibilityGravatar Christoph Hertzberg2019-05-06
| | |
* | | Fix traits for scalar_logistic_op.Gravatar Rasmus Munk Larsen2019-05-03
| | |
| | * Eigen: Fix MSVC C++17 language standard detection logicGravatar Scott Ramsby2019-05-03
| |/ |/| | | | | | | | | | | To detect C++17 support, use _MSVC_LANG macro instead of _MSC_VER. _MSC_VER can indicate whether the current compiler version could support the C++17 language standard, but not whether that standard is actually selected (i.e. via /std:c++17). See these web pages for more details: https://devblogs.microsoft.com/cppblog/msvc-now-correctly-reports-__cplusplus/ https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros
* | Add masked_store_available to unpacket_traitsGravatar Eugene Zhulenev2019-05-02
| |
* | Add masked pstoreu for Packet16hGravatar Eugene Zhulenev2019-05-02
| |