aboutsummaryrefslogtreecommitdiffhomepage
Commit message (Collapse)AuthorAge
* Fixing a CUDA / P100 regression introduced by PR 181Gravatar Deven Desai2020-08-20
| | | | | | PR 181 ( https://gitlab.com/libeigen/eigen/-/merge_requests/181 ) adds `__launch_bounds__(1024)` attribute to GPU kernels, that did not have that attribute explicitly specified. That PR seems to cause regressions on the CUDA platform. This PR/commit makes the changes in PR 181, to be applicable for HIP only
* Fix nightly CI configurationGravatar David Tellenbach2020-08-19
|
* Add possibility to split test suit build targets and improved CI configurationGravatar David Tellenbach2020-08-19
| | | | | | - Introduce CMake option `EIGEN_SPLIT_TESTSUITE` that allows to divide the single test build target into several subtargets - Add CI pipeline for merge request that can be run by GitLab's shared runners - Add nightly CI pipeline
* Add missing inline keyword in Quaternion.h.Gravatar Rasmus Munk Larsen2020-08-14
|
* Disable min/max NaN propagation in test cxx11_tensor_exprGravatar David Tellenbach2020-08-14
| | | | | | | The current pmin/pmax implementation for Arm Neon propagate NaNs differently than std::min/std::max. See issue https://gitlab.com/libeigen/eigen/-/issues/1937
* Fix compilation error in blasutil testGravatar David Tellenbach2020-08-14
|
* Replace the call to int64_t in the blasutil test by explicit typesGravatar David Tellenbach2020-08-14
| | | | | | | | | Some platforms define int64_t to be long long even for C++03. If this is the case we miss the definition of internal::make_unsigned for this type. If we just define the template we get duplicated definitions errors for platforms defining int64_t as signed long for C++03. We need to find a way to distinguish both cases at compile-time.
* bfloat16 packetmath for Arm Neon backendGravatar David Tellenbach2020-08-13
|
* Add support for Bfloat16 to use vector instructions on AltivecGravatar Pedro Caldeira2020-08-10
| | | | architecture
* Adding an explicit launch_bounds(1024) attribute for GPU kernels.Gravatar Deven Desai2020-08-05
| | | | | | | | | | Starting with ROCm 3.5, the HIP compiler will change from HCC to hip-clang. This compiler change introduce a change in the default value of the `__launch_bounds__` attribute associated with a GPU kernel. (default value means the value assumed by the compiler as the `__launch_bounds attribute__` value, when it is not explicitly specified by the user) Currently (i.e. for HIP with ROCm 3.3 and older), the default value is 1024. That changes to 256 with ROCm 3.5 (i.e. hip-clang compiler). As a consequence of this change, if a GPU kernel with a `__luanch_bounds__` attribute of 256 is launched at runtime with a threads_per_block value > 256, it leads to a runtime error. This is leading to a couple of Eigen unit test failures with ROCm 3.5. This commit adds an explicit `__launch_bounds(1024)__` attribute to every GPU kernel that currently does not have it explicitly specified (and hence will end up getting the default value of 256 with the change to hip-clang)
* Temporarily turn off the NEON implementation of pfloor as it does not work ↵Gravatar Zachary Garrett2020-08-04
| | | | | | for large values. The NEON implementation mimics the SSE implementation, but didn't mention the caveat that due to the unsigned of signed integer conversions, not all values in the original floating point represented are supported.
* Disable CI buildstage againGravatar David Tellenbach2020-08-03
|
* add a banner to advertise the surveyGravatar Gael Guennebaud2020-07-29
|
* Fix StlDeque for GCC 10Gravatar David Tellenbach2020-07-29
| | | | | StlDeque extends std::deque by accessing some of its internal members. Since GCC 10 these are not accessible anymore.
* Fix undefine BF16 union behavior in AVX512.Gravatar Teng Lu2020-07-29
|
* Inherit alignment trait from argument in TensorBroadcasting to avoid ↵Gravatar Rasmus Munk Larsen2020-07-28
| | | | segfault when the argument is unaligned.
* Fix clang-tidy warnings in generic bfloat16 implementationGravatar David Tellenbach2020-07-27
| | | | See !172 for related discussions.
* Fix CMake install commandGravatar qxxxb2020-07-25
|
* Don't allow failure for CI build stage anymoreGravatar David Tellenbach2020-07-24
|
* Improve CI configurationGravatar David Tellenbach2020-07-24
| | | | | | | | - Fix docker Fedora image to Fedora:31 - Fix gcc version to gcc-9.2.1 - Use GitLab CI dag - Fix usage of build cache - Introduce build artificats
* Add missing footer declarationGravatar Gael Guennebaud2020-07-24
|
* Fix bfloat16 castsGravatar David Tellenbach2020-07-23
| | | | | | | If we have explicit conversion operators available (C++11) we define explicit casts from bfloat16 to other types. If not (C++03), we don't define conversion operators but rely on implicit conversion chains from bfloat16 over float to other types.
* remove piwik trackerGravatar Gael Guennebaud2020-07-23
|
* Revert change that made conversion from bfloat16 to {float, double} implicit.Gravatar Rasmus Munk Larsen2020-07-22
| | | | Add roundtrip tests for casting between bfloat16 and complex types.
* Fix cast of blfoat16 to std::complex<T>Gravatar David Tellenbach2020-07-22
| | | | This fixes https://gitlab.com/libeigen/eigen/-/issues/1951
* Make sure we take the little-endian path if __BYTE_ORDER__ is not defined.Gravatar Rasmus Munk Larsen2020-07-22
|
* Faster conversion from integer types to bfloat16Gravatar Niels Dekker2020-07-22
| | | | | | Specialized `bfloat16_impl::float_to_bfloat16_rtne(float)` for normal floating point numbers, infinity and zero, in order to improve the performance of `bfloat16::bfloat16(const T&)` for integer argument types. A reduction of more than 20% of the runtime duration of conversion from int to bfloat16 was observed, using Visual C++ 2019 on Windows 10.
* Avoid division by zero in nonZerosEstimate() for empty blocks.Gravatar Rasmus Munk Larsen2020-07-22
|
* Update tensor reduction test to avoid undefined division of bfloat16 by int.Gravatar Rasmus Munk Larsen2020-07-22
|
* Make numext::as_uint a device function.Gravatar Rasmus Munk Larsen2020-07-22
|
* user-defined copy operations removed in favor of compiler-generated onesGravatar Alexander Turkin2020-07-20
|
* Avoid undefined behavior by union type punning in float_to_bfloat16_rtneGravatar Niels Dekker2020-07-14
| | | | | | | | Use `numext::as_uint`, instead of union based type punning, to avoid undefined behavior. See also C++ Core Guidelines: "Don't use a union for type punning" https://github.com/isocpp/CppCoreGuidelines/blob/v0.8/CppCoreGuidelines.md#c183-dont-use-a-union-for-type-punning `numext::as_uint` was suggested by David Tellenbach
* AVX path for BF16Gravatar Sheng Yang2020-07-14
|
* Allow implicit conversion from bfloat16 to float and doubleGravatar Niels Dekker2020-07-11
| | | | | | Conversion from `bfloat16` to `float` and `double` is lossless. It seems natural to allow the conversion to be implicit, as the C++ language also support implicit conversion from a smaller to a larger floating point type. Intel's OneDLL bfloat16 implementation also has an implicit `operator float()`: https://github.com/oneapi-src/oneDNN/blob/v1.5/src/common/bfloat16.hpp
* Guard operator<< test by EIGEN_NO_IO.Gravatar Rasmus Munk Larsen2020-07-09
|
* Guard operator<< by EIGEN_NO_IO.Gravatar Rasmus Munk Larsen2020-07-09
|
* Add operator<< to print a quaternion.Gravatar Rasmus Munk Larsen2020-07-09
|
* Fix test basic stuffGravatar David Tellenbach2020-07-09
| | | | | | - Guard fundamental types that are not available pre C++11 - Separate subsequent angle brackets >> by spaces - Allow casting of Eigen::half and Eigen::bfloat16 to complex types
* Add operator==/operator!= to Quaternion. Fixes #1876.Gravatar Forrest Voight2020-07-07
|
* Change the sign operator in Eigen to return NaN for NaN arguments, not zero.Gravatar Rasmus Munk Larsen2020-07-07
|
* Make test packetmath C++98 compliantGravatar David Tellenbach2020-07-01
|
* BF16 for scalar_cmp_with_cast_opGravatar Sheng Yang2020-07-01
|
* Delete duplicate test cases in vectorization_logic.cppGravatar Kan Chen2020-07-01
|
* Fix tensor casts for large packets and casts to/from std::complexGravatar Antonio Sanchez2020-06-30
| | | | | | | | | | | | | The original tensor casts were only defined for `SrcCoeffRatio`:`TgtCoeffRatio` 1:1, 1:2, 2:1, 4:1. Here we add the missing 1:N and 8:1. We also add casting `Eigen::half` to/from `std::complex<T>`, which was missing to make it consistent with `Eigen:bfloat16`, and generalize the overload to work for any complex type. Tests were added to `basicstuff`, `packetmath`, and `cxx11_tensor_casts` to test all cast configurations.
* Fix denormal check pre c++11.Gravatar Antonio Sanchez2020-06-30
| | | | | `float_denorm_style` is an old-style `enum`, so the `denorm_present` symbol only exists in the `std` namespace prior to c++11.
* Report custom C++ flags in CMake testing summaryGravatar David Tellenbach2020-06-30
|
* Remote CI tags to enable shared runnersGravatar David Tellenbach2020-06-29
|
* Pass CMAKE_MAKE_PROGRAM to Fortran language support testGravatar Christoph Grüninger2020-06-27
| | | | | Otherwise the Make (or Ninja) program is used, which is installed system wide.
* Add initial CI configuration file.Gravatar David Tellenbach2020-06-27
| | | | | The initial CI configuration consists of jobs to build and run tests and to build docs.
* Fix packetmath_1 float tests for arm/aarch64.Gravatar Antonio Sanchez2020-06-24
| | | | | | | | | | | | | | | | | | | Added missing `pmadd<Packet2f>` for NEON. This leads to significant improvement in precision than previous `pmul+padd`, which was causing the `pcos` tests to fail. Also added an approx test with `std::sin`/`std::cos` since otherwise returning any `a^2+b^2=1` would pass. Modified `log(denorm)` tests. Denorms are not always supported by all systems (returns `::min`), are always flushed to zero on 32-bit arm, and configurably flush to zero on sse/avx/aarch64. This leads to inconsistent results across different systems (i.e. `-inf` vs `nan`). Added a check for existence and exclude ARM. Removed logistic exactness test, since scalar and vectorized versions follow different code-paths due to differences in `pexp` and `pmadd`, which result in slightly different values. For example, exactness always fails on arm, aarch64, and altivec.