aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core/util/Memory.h
Commit message (Collapse)AuthorAge
* Support manually disabling exceptionsHEADmasterGravatar Benjamin Barenblat2021-07-07
| | | | | Rename EIGEN_EXCEPTIONS to EIGEN_USE_EXCEPTIONS, and allow disabling exceptions with -DEIGEN_USE_EXCEPTIONS=0.
* DenseStorage safely copy/swap.Gravatar Antonio Sanchez2021-04-22
| | | | | | | | Fixes #2229. For dynamic matrices with fixed-sized storage, only copy/swap elements that have been set. Otherwise, this leads to inefficient copying, and potential UB for non-initialized elements.
* Fix CUDA device new and delete, and add test.Gravatar Antonio Sanchez2021-02-24
| | | | HIP does not support new/delete on device, so test is skipped.
* Revert "add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros ↵Gravatar Antonio Sánchez2021-02-19
| | | | | (only if not HIPCC)." This reverts commit 12fd3dd655e37ba26e7ab236d32163e0aa35da39
* add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if ↵Gravatar Masaki Murooka2021-02-17
| | | | not HIPCC).
* Fix signed-unsigned comparison.Gravatar Antonio Sanchez2021-01-20
| | | | | | | | Hex literals are interpreted as unsigned, leading to a comparison between signed max supported function `abcd[0]` (which was negative) to the unsigned literal `0x80000006`. Should not change result since signed is implicitly converted to unsigned for the comparison, but eliminates the warning.
* Proper CPUIDGravatar Ivan Popivanov2021-01-18
|
* Add EIGEN_UNUSED_VARIABLE to unused variable in Memory.hGravatar Rasmus Munk Larsen2020-09-15
|
* Fix issue #1968. Don't discard return value from "new" in C++17.Gravatar Rasmus Munk Larsen2020-09-13
|
* remove semi triggering -Wextra-semi-stmtGravatar Alexander Neumann2020-09-07
|
* Rollback or PR-746 and partial rollback of ↵Gravatar Rasmus Munk Larsen2019-11-05
| | | | | | | | https://bitbucket.org/eigen/eigen/commits/668ab3fc474e54c7919eda4fbaf11f3a99246494 . std::array is still not supported in CUDA device code on Windows.
* Remove internal::smart_copy and replace with std::copyGravatar Eugene Zhulenev2019-10-29
|
* Merged eigen/eigen into defaultGravatar Deven Desai2019-03-19
|\
| * Fully qualify Eigen::internal::aligned_freeGravatar Sam Hasinoff2019-03-02
| | | | | | | | | | | | | | This helps avoids a conflict on certain Windows toolchains (potentially due to some ADL name resolution bug) in the case where aligned_free is defined in the global namespace. In any case, tightening this up is harmless.
| * bug #1409: make EIGEN_MAKE_ALIGNED_OPERATOR_NEW* macros empty in c++17 mode:Gravatar Gael Guennebaud2019-02-20
| | | | | | | | | | - this helps clang 5 and 6 to support alignas in STL's containers. - this makes the public API of our (and users) classes cleaner
* | ROCm/HIP specfic fixes + updatesGravatar Deven Desai2018-11-19
|/ | | | | | | | | | | | | | | | | | | | | | | | | | 1. Eigen/src/Core/arch/GPU/Half.h Updating the HIPCC implementation half so that it can declared as a __shared__ variable 2. Eigen/src/Core/util/Macros.h, Eigen/src/Core/util/Memory.h introducing a EIGEN_USE_STD(func) macro that calls - std::func be default - ::func when eigen is being compiled with HIPCC This change was requested in the previous HIP PR (https://bitbucket.org/eigen/eigen/pull-requests/518/pr-with-hip-specific-fixes-for-the-eigen/diff) 3. unsupported/Eigen/CXX11/src/Tensor/TensorDeviceThreadPool.h Removing EIGEN_DEVICE_FUNC attribute from pure virtual methods as it is not supported by HIPCC 4. unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h Disabling the template specializations of InnerMostDimReducer as they run into HIPCC link errors
* bug #1618: Use different power-of-2 check to avoid MSVC warningGravatar Christoph Hertzberg2018-11-01
|
* Implement a better workaround for GCC's bug #87544Gravatar Gael Guennebaud2018-10-07
|
* Workaround gcc bug making it trigger an invalid warningGravatar Gael Guennebaud2018-10-07
|
* This commit contains the following (HIP specific) updates:Gravatar Deven Desai2018-10-01
| | | | | | | | | | | | | | | | | | | | | | | | | | - unsupported/Eigen/CXX11/src/Tensor/TensorReductionGpu.h Changing "pass-by-reference" argument to be "pass-by-value" instead (in a __global__ function decl). "pass-by-reference" arguments to __global__ functions are unwise, and will be explicitly flagged as errors by the newer versions of HIP. - Eigen/src/Core/util/Memory.h - unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h Changes introduced in recent commits breaks the HIP compile. Adding EIGEN_DEVICE_FUNC attribute to some functions and calling ::malloc/free instead of the corresponding std:: versions to get the HIP compile working again - unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h Change introduced a recent commit breaks the HIP compile (link stage errors out due to failure to inline a function). Disabling the recently introduced code (only for HIP compile), to get the eigen nightly testing going again. Will submit another PR once we have te proper fix. - Eigen/src/Core/util/ConfigureVectorization.h Enabling GPU VECTOR support when HIP compiler is in use (for both the host and device compile phases)
* Fix EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE for AVX512 or ↵Gravatar Gael Guennebaud2018-09-21
| | | | | | AVX with malloc aligned on 8 bytes only. This change also make it future proof for AVX1024
* Provide EIGEN_ALIGNOF macro, and give handmade_aligned_malloc the ↵Gravatar Christoph Hertzberg2018-09-14
| | | | possibility for alignments larger than the standard alignment.
* Fix doxy and misc. typosGravatar luz.paz"2018-08-01
| | | | | | | | | | | | | | | | Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` --- Eigen/src/Core/ProductEvaluators.h | 4 ++-- Eigen/src/Core/arch/GPU/Half.h | 2 +- Eigen/src/Core/util/Memory.h | 2 +- Eigen/src/Geometry/Hyperplane.h | 2 +- Eigen/src/Geometry/Transform.h | 2 +- Eigen/src/Geometry/Translation.h | 12 ++++++------ doc/PreprocessorDirectives.dox | 2 +- doc/TutorialGeometry.dox | 2 +- test/boostmultiprec.cpp | 2 +- test/triangular.cpp | 2 +- 10 files changed, 16 insertions(+), 16 deletions(-)
* applying EIGEN_DECLARE_TEST to *gpu* testsGravatar Deven Desai2018-07-17
| | | | | | | | | | | | | Also, a few minor fixes for GPU tests running in HIP mode. 1. Adding an include for hip/hip_runtime.h in the Macros.h file For HIP __host__ and __device__ are macros which are defined in hip headers. Their definitions need to be included before their use in the file. 2. Fixing the compile failure in TensorContractionGpu introduced by the commit to "Fuse computations into the Tensor contractions using output kernel" 3. Fixing a HIP/clang specific compile error by making the struct-member assignment explicit
* Relax the condition to not only work on Android.Gravatar Rasmus Munk Larsen2018-07-13
|
* Clang produces incorrect Thumb2 assembler when using alloca.Gravatar Rasmus Munk Larsen2018-07-13
| | | | Don't define EIGEN_ALLOCA when generating Thumb with clang.
* Merged in deven-amd/eigen (pull request PR-402)Gravatar Gael Guennebaud2018-07-12
|\ | | | | | | Adding support for using Eigen in HIP kernels.
* | Fix double ;;Gravatar Gael Guennebaud2018-07-11
| |
| * merging updates from upstreamGravatar Deven Desai2018-07-11
| |\ | |/ |/|
* | Introduce the macro ei_declare_local_nested_eval to help allocating on the ↵Gravatar Gael Guennebaud2018-07-09
| | | | | | | | | | | | stack local temporaries via alloca, and let outer-products makes a good use of it. If successful, we should use it everywhere nested_eval is used to declare local dense temporaries.
| * updates based on PR feedbackGravatar Deven Desai2018-06-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two major changes (and a few minor ones which are not listed here...see PR discussion for details) 1. Eigen::half implementations for HIP and CUDA have been merged. This means that - `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h` - `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h` - `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h` After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install. 2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate. - `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC || EIGEN_HIPCC)` - `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH || EIGEN_HIP_DEVICE_COMPILE)` - `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`
* | Extend CUDA support to matrix inversion and selfadjointeigensolverGravatar Andrea Bocci2018-06-11
| |
| * Adding support for using Eigen in HIP kernels.Gravatar Deven Desai2018-06-06
|/ | | | | | | | | This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs. Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor) Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.
* commit 45e9c9996da790b55ed9c4b0dfeae49492ac5c46 (HEAD -> memory_fix)Gravatar Gael Guennebaud2018-04-03
| | | | | | | | | | | | | | | | | Author: George Burgess IV <gbiv@google.com> Date: Thu Mar 1 11:20:24 2018 -0800 Prefer `::operator new` to `new` The C++ standard allows compilers much flexibility with `new` expressions, including eliding them entirely (https://godbolt.org/g/yS6i91). However, calls to `operator new` are required to be treated like opaque function calls. Since we're calling `new` for side-effects other than allocating heap memory, we should prefer the less flexible version. Signed-off-by: George Burgess IV <gbiv@google.com>
* MIsc. source and comment typosGravatar luz.paz2018-03-11
| | | | Found using `codespell` and `grep` from downstream FreeCAD
* bug #1468 (1/2) : add missing std:: to memcpyGravatar Gael Guennebaud2017-09-22
|
* Update documentation for aligned_allocatorGravatar Gael Guennebaud2017-09-20
|
* Revert PR-292. After further investigation, the memcpy->memmove change was ↵Gravatar Rasmus Munk Larsen2017-01-26
| | | | | | only good for Haswell on older versions of glibc. Adding a switch for small sizes is perhaps useful for string copies, but also has an overhead for larger sizes, making it a poor trade-off for general memcpy. This PR also removes a couple of unnecessary semi-colons in Eigen/src/Core/AssignEvaluator.h that caused compiler warning everywhere.
* Update copy helper to use fast_memcpy.Gravatar Rasmus Munk Larsen2017-01-24
|
* Adds a fast memcpy function to Eigen. This takes advantage of the following:Gravatar Rasmus Munk Larsen2017-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster. 2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation. The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}. Measured improvements in wall clock time: Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_memcpy_1T/2 3.48 2.39 +31.3% BM_memcpy_1T/8 12.3 6.51 +47.0% BM_memcpy_1T/64 371 383 -3.2% BM_memcpy_1T/512 66922 66720 +0.3% BM_memcpy_1T/4k 9892867 6849682 +30.8% BM_memcpy_1T/5k 14951099 10332856 +30.9% BM_memcpy_2T/2 3.50 2.46 +29.7% BM_memcpy_2T/8 12.3 7.66 +37.7% BM_memcpy_2T/64 371 376 -1.3% BM_memcpy_2T/512 66652 66788 -0.2% BM_memcpy_2T/4k 6145012 6117776 +0.4% BM_memcpy_2T/5k 9181478 9010942 +1.9% BM_memcpy_4T/2 3.47 2.47 +31.0% BM_memcpy_4T/8 12.3 6.67 +45.8 BM_memcpy_4T/64 374 376 -0.5% BM_memcpy_4T/512 67833 68019 -0.3% BM_memcpy_4T/4k 5057425 5188253 -2.6% BM_memcpy_4T/5k 7555638 7779468 -3.0% BM_memcpy_6T/2 3.51 2.50 +28.8% BM_memcpy_6T/8 12.3 7.61 +38.1% BM_memcpy_6T/64 373 378 -1.3% BM_memcpy_6T/512 66871 66774 +0.1% BM_memcpy_6T/4k 5112975 5233502 -2.4% BM_memcpy_6T/5k 7614180 7772246 -2.1% BM_memcpy_8T/2 3.47 2.41 +30.5% BM_memcpy_8T/8 12.4 10.5 +15.3% BM_memcpy_8T/64 372 388 -4.3% BM_memcpy_8T/512 67373 66588 +1.2% BM_memcpy_8T/4k 5148462 5254897 -2.1% BM_memcpy_8T/5k 7660989 7799058 -1.8% BM_memcpy_12T/2 3.50 2.40 +31.4% BM_memcpy_12T/8 12.4 7.55 +39.1 BM_memcpy_12T/64 374 378 -1.1% BM_memcpy_12T/512 67132 66683 +0.7% BM_memcpy_12T/4k 5185125 5292920 -2.1% BM_memcpy_12T/5k 7717284 7942684 -2.9% BM_slicingSmallPieces_1T/2 47.3 47.5 +0.4% BM_slicingSmallPieces_1T/8 53.6 52.3 +2.4% BM_slicingSmallPieces_1T/64 491 476 +3.1% BM_slicingSmallPieces_1T/512 21734 18814 +13.4% BM_slicingSmallPieces_1T/4k 394660 396760 -0.5% BM_slicingSmallPieces_1T/5k 218722 209244 +4.3% BM_slicingSmallPieces_2T/2 80.7 79.9 +1.0% BM_slicingSmallPieces_2T/8 54.2 53.1 +2.0 BM_slicingSmallPieces_2T/64 497 477 +4.0% BM_slicingSmallPieces_2T/512 21732 18822 +13.4% BM_slicingSmallPieces_2T/4k 392885 390490 +0.6% BM_slicingSmallPieces_2T/5k 221988 208678 +6.0% BM_slicingSmallPieces_4T/2 80.8 80.1 +0.9% BM_slicingSmallPieces_4T/8 54.1 53.2 +1.7% BM_slicingSmallPieces_4T/64 493 476 +3.4% BM_slicingSmallPieces_4T/512 21702 18758 +13.6% BM_slicingSmallPieces_4T/4k 393962 404023 -2.6% BM_slicingSmallPieces_4T/5k 249667 211732 +15.2% BM_slicingSmallPieces_6T/2 80.5 80.1 +0.5% BM_slicingSmallPieces_6T/8 54.4 53.4 +1.8% BM_slicingSmallPieces_6T/64 488 478 +2.0% BM_slicingSmallPieces_6T/512 21719 18841 +13.3% BM_slicingSmallPieces_6T/4k 394950 397583 -0.7% BM_slicingSmallPieces_6T/5k 223080 210148 +5.8% BM_slicingSmallPieces_8T/2 81.2 80.4 +1.0% BM_slicingSmallPieces_8T/8 58.1 53.5 +7.9% BM_slicingSmallPieces_8T/64 489 480 +1.8% BM_slicingSmallPieces_8T/512 21586 18798 +12.9% BM_slicingSmallPieces_8T/4k 394592 400165 -1.4% BM_slicingSmallPieces_8T/5k 219688 208301 +5.2% BM_slicingSmallPieces_12T/2 80.2 79.8 +0.7% BM_slicingSmallPieces_12T/8 54.4 53.4 +1.8 BM_slicingSmallPieces_12T/64 488 476 +2.5% BM_slicingSmallPieces_12T/512 21931 18831 +14.1% BM_slicingSmallPieces_12T/4k 393962 396541 -0.7% BM_slicingSmallPieces_12T/5k 218803 207965 +5.0%
* Add std:: namespace prefix to all (hopefully) instances if size_t/ptrdfiff_tGravatar Gael Guennebaud2017-01-23
|
* typo UIntPtrGravatar Angelos Mantzaflaris2016-12-01
| | | | | (grafted from b6f04a2dd4d68fe1858524709813a5df5b9a085b )
* fix two warnings(unused typedef, unused variable) and a typoGravatar Angelos Mantzaflaris2016-12-01
| | | | | (grafted from a9aa3bcf50d55b63c8adb493a06c903ec34251c6 )
* Fixed compilation warnings generated by nvcc 6.5 (and below) when compiling ↵Gravatar Benoit Steiner2016-09-14
| | | | the EIGEN_THROW macro
* Introduce internal's UIntPtr and IntPtr types for pointer to integer ↵Gravatar Gael Guennebaud2016-05-26
| | | | | | | | conversions. This fixes "conversion from pointer to same-sized integral type" warnings by ICC. Ideally, we would use the std::[u]intptr_t types all the time, but since they are C99/C++11 only, let's be safe.
* bug #1170: skip calls to memcpy/memmove for empty imput.Gravatar Gael Guennebaud2016-02-19
|
* Add an explicit assersion on the alignment of the pointer returned by ↵Gravatar Gael Guennebaud2016-02-05
| | | | std::malloc
* Remove posix_memalign, _mm_malloc, and _aligned_malloc special paths.Gravatar Gael Guennebaud2016-02-05
|
* Fixed some compilation problems with nvcc + clangGravatar Benoit Steiner2016-01-27
|
* Made the blas utils usable from within a cuda kernelGravatar Benoit Steiner2016-01-11
|