eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	applying EIGEN_DECLARE_TEST to gpu tests	Deven Desai	2018-07-17
\| \| \| \| \| \| \| \| \| \| \| \| \|	Also, a few minor fixes for GPU tests running in HIP mode. 1. Adding an include for hip/hip_runtime.h in the Macros.h file For HIP __host__ and __device__ are macros which are defined in hip headers. Their definitions need to be included before their use in the file. 2. Fixing the compile failure in TensorContractionGpu introduced by the commit to "Fuse computations into the Tensor contractions using output kernel" 3. Fixing a HIP/clang specific compile error by making the struct-member assignment explicit
*	Relax the condition to not only work on Android.	Rasmus Munk Larsen	2018-07-13
\|
*	Clang produces incorrect Thumb2 assembler when using alloca.	Rasmus Munk Larsen	2018-07-13
\| \| \| \|	Don't define EIGEN_ALLOCA when generating Thumb with clang.
*	Merged in deven-amd/eigen (pull request PR-402)	Gael Guennebaud	2018-07-12
\|\ \| \| \| \| \| \|	Adding support for using Eigen in HIP kernels.
* \|	Fix double ;;	Gael Guennebaud	2018-07-11
\| \|
\| *	merging updates from upstream	Deven Desai	2018-07-11
\| \|\ \| \|/ \|/\|
* \|	Introduce the macro ei_declare_local_nested_eval to help allocating on the ↵	Gael Guennebaud	2018-07-09
\| \| \| \| \| \| \| \| \| \| \| \|	stack local temporaries via alloca, and let outer-products makes a good use of it. If successful, we should use it everywhere nested_eval is used to declare local dense temporaries.
\| *	updates based on PR feedback	Deven Desai	2018-06-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two major changes (and a few minor ones which are not listed here...see PR discussion for details) 1. Eigen::half implementations for HIP and CUDA have been merged. This means that - `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h` - `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h` - `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h` After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install. 2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate. - `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC \|\| EIGEN_HIPCC)` - `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH \|\| EIGEN_HIP_DEVICE_COMPILE)` - `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`
* \|	Extend CUDA support to matrix inversion and selfadjointeigensolver	Andrea Bocci	2018-06-11
\| \|
\| *	Adding support for using Eigen in HIP kernels.	Deven Desai	2018-06-06
\|/ \| \| \| \| \| \| \| \|	This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs. Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor) Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.
*	commit 45e9c9996da790b55ed9c4b0dfeae49492ac5c46 (HEAD -> memory_fix)	Gael Guennebaud	2018-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Author: George Burgess IV <gbiv@google.com> Date: Thu Mar 1 11:20:24 2018 -0800 Prefer `::operator new` to `new` The C++ standard allows compilers much flexibility with `new` expressions, including eliding them entirely (https://godbolt.org/g/yS6i91). However, calls to `operator new` are required to be treated like opaque function calls. Since we're calling `new` for side-effects other than allocating heap memory, we should prefer the less flexible version. Signed-off-by: George Burgess IV <gbiv@google.com>
*	MIsc. source and comment typos	luz.paz	2018-03-11
\| \| \| \|	Found using `codespell` and `grep` from downstream FreeCAD
*	bug #1468 (1/2) : add missing std:: to memcpy	Gael Guennebaud	2017-09-22
\|
*	Update documentation for aligned_allocator	Gael Guennebaud	2017-09-20
\|
*	Revert PR-292. After further investigation, the memcpy->memmove change was ↵	Rasmus Munk Larsen	2017-01-26
\| \| \| \| \| \|	only good for Haswell on older versions of glibc. Adding a switch for small sizes is perhaps useful for string copies, but also has an overhead for larger sizes, making it a poor trade-off for general memcpy. This PR also removes a couple of unnecessary semi-colons in Eigen/src/Core/AssignEvaluator.h that caused compiler warning everywhere.
*	Update copy helper to use fast_memcpy.	Rasmus Munk Larsen	2017-01-24
\|
*	Adds a fast memcpy function to Eigen. This takes advantage of the following:	Rasmus Munk Larsen	2017-01-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster. 2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation. The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}. Measured improvements in wall clock time: Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_memcpy_1T/2 3.48 2.39 +31.3% BM_memcpy_1T/8 12.3 6.51 +47.0% BM_memcpy_1T/64 371 383 -3.2% BM_memcpy_1T/512 66922 66720 +0.3% BM_memcpy_1T/4k 9892867 6849682 +30.8% BM_memcpy_1T/5k 14951099 10332856 +30.9% BM_memcpy_2T/2 3.50 2.46 +29.7% BM_memcpy_2T/8 12.3 7.66 +37.7% BM_memcpy_2T/64 371 376 -1.3% BM_memcpy_2T/512 66652 66788 -0.2% BM_memcpy_2T/4k 6145012 6117776 +0.4% BM_memcpy_2T/5k 9181478 9010942 +1.9% BM_memcpy_4T/2 3.47 2.47 +31.0% BM_memcpy_4T/8 12.3 6.67 +45.8 BM_memcpy_4T/64 374 376 -0.5% BM_memcpy_4T/512 67833 68019 -0.3% BM_memcpy_4T/4k 5057425 5188253 -2.6% BM_memcpy_4T/5k 7555638 7779468 -3.0% BM_memcpy_6T/2 3.51 2.50 +28.8% BM_memcpy_6T/8 12.3 7.61 +38.1% BM_memcpy_6T/64 373 378 -1.3% BM_memcpy_6T/512 66871 66774 +0.1% BM_memcpy_6T/4k 5112975 5233502 -2.4% BM_memcpy_6T/5k 7614180 7772246 -2.1% BM_memcpy_8T/2 3.47 2.41 +30.5% BM_memcpy_8T/8 12.4 10.5 +15.3% BM_memcpy_8T/64 372 388 -4.3% BM_memcpy_8T/512 67373 66588 +1.2% BM_memcpy_8T/4k 5148462 5254897 -2.1% BM_memcpy_8T/5k 7660989 7799058 -1.8% BM_memcpy_12T/2 3.50 2.40 +31.4% BM_memcpy_12T/8 12.4 7.55 +39.1 BM_memcpy_12T/64 374 378 -1.1% BM_memcpy_12T/512 67132 66683 +0.7% BM_memcpy_12T/4k 5185125 5292920 -2.1% BM_memcpy_12T/5k 7717284 7942684 -2.9% BM_slicingSmallPieces_1T/2 47.3 47.5 +0.4% BM_slicingSmallPieces_1T/8 53.6 52.3 +2.4% BM_slicingSmallPieces_1T/64 491 476 +3.1% BM_slicingSmallPieces_1T/512 21734 18814 +13.4% BM_slicingSmallPieces_1T/4k 394660 396760 -0.5% BM_slicingSmallPieces_1T/5k 218722 209244 +4.3% BM_slicingSmallPieces_2T/2 80.7 79.9 +1.0% BM_slicingSmallPieces_2T/8 54.2 53.1 +2.0 BM_slicingSmallPieces_2T/64 497 477 +4.0% BM_slicingSmallPieces_2T/512 21732 18822 +13.4% BM_slicingSmallPieces_2T/4k 392885 390490 +0.6% BM_slicingSmallPieces_2T/5k 221988 208678 +6.0% BM_slicingSmallPieces_4T/2 80.8 80.1 +0.9% BM_slicingSmallPieces_4T/8 54.1 53.2 +1.7% BM_slicingSmallPieces_4T/64 493 476 +3.4% BM_slicingSmallPieces_4T/512 21702 18758 +13.6% BM_slicingSmallPieces_4T/4k 393962 404023 -2.6% BM_slicingSmallPieces_4T/5k 249667 211732 +15.2% BM_slicingSmallPieces_6T/2 80.5 80.1 +0.5% BM_slicingSmallPieces_6T/8 54.4 53.4 +1.8% BM_slicingSmallPieces_6T/64 488 478 +2.0% BM_slicingSmallPieces_6T/512 21719 18841 +13.3% BM_slicingSmallPieces_6T/4k 394950 397583 -0.7% BM_slicingSmallPieces_6T/5k 223080 210148 +5.8% BM_slicingSmallPieces_8T/2 81.2 80.4 +1.0% BM_slicingSmallPieces_8T/8 58.1 53.5 +7.9% BM_slicingSmallPieces_8T/64 489 480 +1.8% BM_slicingSmallPieces_8T/512 21586 18798 +12.9% BM_slicingSmallPieces_8T/4k 394592 400165 -1.4% BM_slicingSmallPieces_8T/5k 219688 208301 +5.2% BM_slicingSmallPieces_12T/2 80.2 79.8 +0.7% BM_slicingSmallPieces_12T/8 54.4 53.4 +1.8 BM_slicingSmallPieces_12T/64 488 476 +2.5% BM_slicingSmallPieces_12T/512 21931 18831 +14.1% BM_slicingSmallPieces_12T/4k 393962 396541 -0.7% BM_slicingSmallPieces_12T/5k 218803 207965 +5.0%
*	Add std:: namespace prefix to all (hopefully) instances if size_t/ptrdfiff_t	Gael Guennebaud	2017-01-23
\|
*	typo UIntPtr	Angelos Mantzaflaris	2016-12-01
\| \| \| \| \|	(grafted from b6f04a2dd4d68fe1858524709813a5df5b9a085b )
*	fix two warnings(unused typedef, unused variable) and a typo	Angelos Mantzaflaris	2016-12-01
\| \| \| \| \|	(grafted from a9aa3bcf50d55b63c8adb493a06c903ec34251c6 )
*	Fixed compilation warnings generated by nvcc 6.5 (and below) when compiling ↵	Benoit Steiner	2016-09-14
\| \| \| \|	the EIGEN_THROW macro
*	Introduce internal's UIntPtr and IntPtr types for pointer to integer ↵	Gael Guennebaud	2016-05-26
\| \| \| \| \| \| \| \|	conversions. This fixes "conversion from pointer to same-sized integral type" warnings by ICC. Ideally, we would use the std::[u]intptr_t types all the time, but since they are C99/C++11 only, let's be safe.
*	bug #1170: skip calls to memcpy/memmove for empty imput.	Gael Guennebaud	2016-02-19
\|
*	Add an explicit assersion on the alignment of the pointer returned by ↵	Gael Guennebaud	2016-02-05
\| \| \| \|	std::malloc
*	Remove posix_memalign, _mm_malloc, and _aligned_malloc special paths.	Gael Guennebaud	2016-02-05
\|
*	Fixed some compilation problems with nvcc + clang	Benoit Steiner	2016-01-27
\|
*	Made the blas utils usable from within a cuda kernel	Benoit Steiner	2016-01-11
\|
*	Workaround "empty paragraph" warning with clang -Wdocumentation	Gael Guennebaud	2015-12-30
\|
*	bug #1109: use noexcept instead of throw for C++11 compilers	Gael Guennebaud	2015-12-10
\|
*	Let unpacket_traits<> exposes the required alignment and make use of it ↵	Gael Guennebaud	2015-08-07
\| \| \| \|	everywhere
*	Generalize first_aligned to take the requested alignment as a template ↵	Gael Guennebaud	2015-08-06
\| \| \| \|	parameter, and add a first_default_aligned variante calling first_aligned with the requirement of the largest packet for the given scalar type.
*	Add a EIGEN_DEFAULT_ALIGN_BYTES macro defining default alignment for alloca ↵	Gael Guennebaud	2015-08-06
\| \| \| \| \| \|	and aligned_malloc. It is defined as the max of EIGEN_IDEAL_MAX_ALIGN_BYTES and EIGEN_MAX_ALIGN_BYTES
*	bug #973: update macro-level control of alignement by introducing ↵	Gael Guennebaud	2015-07-29
\| \| \| \|	user-controllable EIGEN_MAX_ALIGN_BYTES and EIGEN_MAX_STATIC_ALIGN_BYTES macros. This changeset also removes EIGEN_ALIGN (replaced by EIGEN_MAX_ALIGN_BYTES>0), EIGEN_ALIGN_STATICALLY (replaced by EIGEN_MAX_STATIC_ALIGN_BYTES>0), EIGEN_USER_ALIGN*, EIGEN_ALIGN_DEFAULT (replaced by EIGEN_ALIGN_MAX).
*	Fixed some compiler bugs in NVCC, now compiles with CUDA.	Jonas Adler	2015-07-22
\| \| \| \|	(chtz: Manually joined sevaral commits to keep the history clean)
*	Disable posix_memalign on Solaris and SunOS, and allows to by-pass built-in ↵	Gael Guennebaud	2015-04-24
\| \| \| \|	posix_memalign detection rules.
*	Marked a few functions as EIGEN_DEVICE_FUNC to enable the use of tensors in ↵	Benoit Steiner	2015-02-10
\| \| \| \|	cuda kernels.
*	bug #921: fix utilization of bitwise operation on enums in first_aligned	Gael Guennebaud	2014-12-19
\|
*	Introduce unified macros to identify compiler, OS, and architecture. They ↵	Gael Guennebaud	2014-11-04
\| \| \| \|	are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively.
*	Add lapack interface to JacobiSVD and BDCSVD	Gael Guennebaud	2014-10-17
\|
*	Fix indentation	Gael Guennebaud	2014-10-09
\|
*	Add a scoped_array helper class to handle locally allocated/used arrays	Gael Guennebaud	2014-10-09
\|
*	Fix bug #884: No malloc for zero-sized matrices or for Ref without temporaries	Christoph Hertzberg	2014-09-25
\|
*	bug #861: enable posix_memalign with PGI	Gael Guennebaud	2014-08-26
\|
*	add missing delete operator overloads	Gael Guennebaud	2014-07-30
\|
*	Define EIGEN_TRY, EIGEN_CATCH, EIGEN_THROW as suggested by Moritz Klammer.	Christoph Hertzberg	2014-07-22
\| \| \| \| \|	Make it possible to run unit-tests with exceptions disabled via EIGEN_TEST_NO_EXCEPTIONS flag. Enhanced ctorleak unit-test
*	Applied changes suggested by Christoph Hertzberg to c'tor leak fix.	Moritz Klammler	2014-07-18
\| \| \| \| \|	- Enclose exception handling in '#ifdef EIGEN_EXCEPTIONS'. - Use an object counter to demonstrate the bug more readily.
*	Avoid memory leak when constructor of user-defined type throws exception.	Moritz Klammler	2014-07-06
\| \| \| \| \| \| \| \| \| \| \| \|	The added check `ctorleak.cpp` demonstrates how the leak can be reproduced. The test appears to pass but it is leaking the storage of the (not created) matrix. I don't know how to make this test fail in the existing test suite but you can run it through Valgrind (or another debugger) to verify the leak. $ ./check.sh ctorleak && valgrind --leak-check=full ./test/ctorleak This patch fixes this leak by adding some try-catch-delete-rethrow blocks to `Eigen/src/Core/util/Memory.h`.
*	bug #837: Always re-align the result of EIGEN_ALLOCA.	Christoph Hertzberg	2014-07-08
\|
*	Fix bug #729: Use alloca if it is defined	Christoph Hertzberg	2014-06-23
\|
*	Fix bug #803: avoid char* to int* conversion	Gael Guennebaud	2014-05-01
\|