aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core
Commit message (Collapse)AuthorAge
...
| * | Adding new arch/SYCL headers, used for SYCL vectorization.Gravatar Mehdi Goli2018-08-01
| | |
| | * variadic version of assert which can take a parameter pack as its input.Gravatar Mehdi Goli2018-08-01
| |/
| * bug #1578: Improve prefetching in matrix multiplication on MIPS.Gravatar Alexey Frunze2018-07-24
| |
| * Re-enable FMA for fast sqrt functionsGravatar Mark D Ryan2018-07-30
| |
| * Fix AVX512 implementations of psqrtGravatar Mark D Ryan2018-06-25
| | | | | | | | | | | | | | | | | | | | | | | | | | This commit fixes the AVX512 implementations of psqrt in the same way that 3ed67cb0bb4af65fbf243df598604a8c7630bf7d fixed the AVX2 version of this function. The AVX512 versions of psqrt incorrectly return -0.0 for negative values, instead of NaN. Fixing the issues requires adding some additional instructions that slow down the algorithms. A similar test to the one used in 3ed67cb0bb4af65fbf243df598604a8c7630bf7d shows that the corrected Packet16f code runs at 73% of the speed of the existing code, while the corrected Packed8d function runs at 68% of the original.
| * Add pcast packet op for NEON.Gravatar Rasmus Munk Larsen2018-07-26
| |
| * Fixed issue which made documentation not getting built anymoreGravatar Christoph Hertzberg2018-07-24
| |
| * fix typoGravatar Gael Guennebaud2018-07-23
| |
| * Add lastN shorcuts to seq/seqN.Gravatar Gael Guennebaud2018-07-23
| |
| * Disable type traits for stdlibc++ <= 4.9.3Gravatar Eugene Zhulenev2018-07-20
| |
| * Fix IsRelocatable without C++11Gravatar Gael Guennebaud2018-07-19
| |
| * Fix determination of EIGEN_HAS_TYPE_TRAITSGravatar Gael Guennebaud2018-07-19
| |
| * Add MIPS changes missing from previous merge.Gravatar Alexey Frunze2018-07-18
| |
| * Disable type traits for GCC < 5.1.0Gravatar Eugene Zhulenev2018-07-18
| |
| * bug #1432: fix conservativeResize for non-relocatable scalar types. For ↵Gravatar Gael Guennebaud2018-07-18
| | | | | | | | those we need to by-pass realloc routines and fall-back to allocate as new - copy - delete. The remaining problem is that we don't have any mechanism to accurately determine whether a type is relocatable or not, so currently let's be super conservative using either RequireInitialization or std::is_trivially_copyable
| * applying EIGEN_DECLARE_TEST to *gpu* testsGravatar Deven Desai2018-07-17
| | | | | | | | | | | | | | | | | | | | | | | | | | Also, a few minor fixes for GPU tests running in HIP mode. 1. Adding an include for hip/hip_runtime.h in the Macros.h file For HIP __host__ and __device__ are macros which are defined in hip headers. Their definitions need to be included before their use in the file. 2. Fixing the compile failure in TensorContractionGpu introduced by the commit to "Fuse computations into the Tensor contractions using output kernel" 3. Fixing a HIP/clang specific compile error by making the struct-member assignment explicit
| * bug #1572: use c++11 atomic instead of volatile if c++11 is available, and ↵Gravatar Gael Guennebaud2018-07-17
| | | | | | | | disable multi-threaded GEMM on non-x86 without c++11.
| * Relax the condition to not only work on Android.Gravatar Rasmus Munk Larsen2018-07-13
| |
| * Clang produces incorrect Thumb2 assembler when using alloca.Gravatar Rasmus Munk Larsen2018-07-13
| | | | | | | | Don't define EIGEN_ALLOCA when generating Thumb with clang.
| * bug #1571: fix is_convertible<from,to> with "from" a reference.Gravatar Gael Guennebaud2018-07-13
| |
| * Forward declaring std::array does not work with all std libs, so let's just ↵Gravatar Gael Guennebaud2018-07-13
| | | | | | | | include <array>
| * Add support for MIPS SIMD (MSA)Gravatar Alexey Frunze2018-07-06
| |
| * Fix compilation regarding std::arrayGravatar Gael Guennebaud2018-07-12
| |
| * fix unused warningGravatar Gael Guennebaud2018-07-12
| |
| * Cleanup the mess in Eigen/Core by moving CUDA/HIP stuff at more appropriate ↵Gravatar Gael Guennebaud2018-07-12
| | | | | | | | | | | | places (Macros.h), and alignment/vectorization logic is now in util/ConfigureVectorization.h
| * remove double ;;Gravatar Gael Guennebaud2018-07-12
| |
| * bug #1570: fix warningGravatar Gael Guennebaud2018-07-12
| |
| * Merged in deven-amd/eigen (pull request PR-402)Gravatar Gael Guennebaud2018-07-12
| |\ | | | | | | | | | Adding support for using Eigen in HIP kernels.
| * | Remove useless specialization thanks to is_convertible being more robust.Gravatar Gael Guennebaud2018-07-12
| | |
| * | spellcheckGravatar Gael Guennebaud2018-07-12
| | |
| * | Make is_convertible more robust and conformant to std::is_convertibleGravatar Gael Guennebaud2018-07-12
| | |
| * | Fix regression in 9357838f94d2907996adadc7e5200376f3561ed4Gravatar Gael Guennebaud2018-07-11
| | |
| * | Fix double ;;Gravatar Gael Guennebaud2018-07-11
| | |
| | * Updates corresponding to the latest round of PR feedbackGravatar Deven Desai2018-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The major changes are 1. Moving CUDA/PacketMath.h to GPU/PacketMath.h 2. Moving CUDA/MathFunctions.h to GPU/MathFunction.h 3. Moving CUDA/CudaSpecialFunctions.h to GPU/GpuSpecialFunctions.h The above three changes effectively enable the Eigen "Packet" layer for the HIP platform 4. Merging the "hip_basic" and "cuda_basic" unit tests into one ("gpu_basic") 5. Updating the "EIGEN_DEVICE_FUNC" marking in some places The change has been tested on the HIP and CUDA platforms.
| | * renaming CUDA* to GPU* for some header filesGravatar Deven Desai2018-07-11
| | |
| | * merging updates from upstreamGravatar Deven Desai2018-07-11
| | |\ | | |/ | |/|
| * | Add internall::is_identity compile-time helperGravatar Gael Guennebaud2018-07-11
| | |
| * | Fix conversion warningGravatar Gael Guennebaud2018-07-10
| | |
| * | bug #1543: improve linear indexing for general block expressionsGravatar Gael Guennebaud2018-07-10
| | |
| * | Introduce the macro ei_declare_local_nested_eval to help allocating on the ↵Gravatar Gael Guennebaud2018-07-09
| | | | | | | | | | | | | | | | | | stack local temporaries via alloca, and let outer-products makes a good use of it. If successful, we should use it everywhere nested_eval is used to declare local dense temporaries.
| * | Skip null numerators in triangular-vector-solve (as in BLAS TRSV).Gravatar Gael Guennebaud2018-07-09
| | |
| * | Fix legitimate "declaration shadows a typedef" warningGravatar Gael Guennebaud2018-07-09
| | |
| * | Fix the Packet16h version of ptransposeGravatar Mark D Ryan2018-06-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The AVX512 version of ptranpose for PacketBlock<Packet16h,16> was reordering the PacketBlock argument incorrectly. This lead to errors in the multiplication of matrices composed of 16 bit floats on AVX512 machines, if at least of the matrices was using RowMajor order. This error is responsible for one tensorflow unit test failure on AVX512 machines: //tensorflow/python/kernel_tests:batch_matmul_op_test
| * | Fix a few issues with Packet16hGravatar Gael Guennebaud2018-07-07
| | |
| * | complete implementation of Packet16h (AVX512)Gravatar Gael Guennebaud2018-07-06
| | |
| * | Complete Packet8h implementation and test it in packetmath unit testGravatar Gael Guennebaud2018-07-06
| | |
| | * updates based on PR feedbackGravatar Deven Desai2018-06-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two major changes (and a few minor ones which are not listed here...see PR discussion for details) 1. Eigen::half implementations for HIP and CUDA have been merged. This means that - `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h` - `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h` - `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h` After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install. 2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate. - `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC || EIGEN_HIPCC)` - `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH || EIGEN_HIP_DEVICE_COMPILE)` - `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`
| | * moving Half headers from CUDA dir to GPU dir, removing the HIP versionsGravatar Deven Desai2018-06-13
| | |
| | * syncing this fork with upstreamGravatar Deven Desai2018-06-13
| | |\
| * | | Extend CUDA support to matrix inversion and selfadjointeigensolverGravatar Andrea Bocci2018-06-11
| | | |