| Commit message (Collapse) | Author | Age |
... | |
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Fixed Visual Studio 2019 Code Analysis (C++ Core Guidelines) warning
C26450 from inside `half_impl::float_to_half_rtne(float)`:
> Arithmetic overflow: '<<' operation causes overflow at compile time.
|
|
|
|
|
| |
Including new tests for bfloat16 Packets.
Fix prsqrt on GenericPacketMath.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Implemented fast size-4 matrix inverse (mimicking Inverse_SSE.h) using NEON intrinsics.
```
Benchmark Time CPU Time Old Time New CPU Old CPU New
--------------------------------------------------------------------------------------------------------
BM_float -0.1285 -0.1275 568 495 572 499
BM_double -0.2265 -0.2254 638 494 641 496
```
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Changes to Altivec/MatrixProduct
Adapting code to gcc 10.
Generic code style and performance enhancements.
Adding PanelMode support.
Adding stride/offset support.
Enabling float64, std::complex and std::complex.
Fixing lack of symm_pack.
Enabling mixedtypes.
- Adding std::complex tests to blasutil.
- Adding an implementation of storePacketBlock when Incr!= 1.
|
|
|
|
|
|
| |
it.
Implementing pcmp_eq to Packet8 and Packet16.
|
| |
|
|
|
|
|
|
|
|
|
| |
https://cmake.org/cmake/help/v3.5/command/project.html
Note: Call the cmake_minimum_required() command at the beginning of the
top-level CMakeLists.txt file even before calling the project() command.
It is important to establish version and policy settings before invoking
other commands whose behavior they may affect. See also policy CMP0000.
|
|
|
|
| |
pmul and psub.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
PR 181 ( https://gitlab.com/libeigen/eigen/-/merge_requests/181 ) adds `__launch_bounds__(1024)` attribute to GPU kernels, that did not have that attribute explicitly specified.
That PR seems to cause regressions on the CUDA platform. This PR/commit makes the changes in PR 181, to be applicable for HIP only
|
| |
|
|
|
|
|
|
| |
- Introduce CMake option `EIGEN_SPLIT_TESTSUITE` that allows to divide the single test build target into several subtargets
- Add CI pipeline for merge request that can be run by GitLab's shared runners
- Add nightly CI pipeline
|
| |
|
|
|
|
|
|
|
| |
The current pmin/pmax implementation for Arm Neon propagate NaNs
differently than std::min/std::max.
See issue https://gitlab.com/libeigen/eigen/-/issues/1937
|
| |
|
|
|
|
|
|
|
|
|
| |
Some platforms define int64_t to be long long even for C++03. If this is
the case we miss the definition of internal::make_unsigned for this
type. If we just define the template we get duplicated definitions
errors for platforms defining int64_t as signed long for C++03.
We need to find a way to distinguish both cases at compile-time.
|
| |
|
|
|
|
| |
architecture
|
|
|
|
|
|
|
|
|
|
| |
Starting with ROCm 3.5, the HIP compiler will change from HCC to hip-clang.
This compiler change introduce a change in the default value of the `__launch_bounds__` attribute associated with a GPU kernel. (default value means the value assumed by the compiler as the `__launch_bounds attribute__` value, when it is not explicitly specified by the user)
Currently (i.e. for HIP with ROCm 3.3 and older), the default value is 1024. That changes to 256 with ROCm 3.5 (i.e. hip-clang compiler). As a consequence of this change, if a GPU kernel with a `__luanch_bounds__` attribute of 256 is launched at runtime with a threads_per_block value > 256, it leads to a runtime error. This is leading to a couple of Eigen unit test failures with ROCm 3.5.
This commit adds an explicit `__launch_bounds(1024)__` attribute to every GPU kernel that currently does not have it explicitly specified (and hence will end up getting the default value of 256 with the change to hip-clang)
|
|
|
|
|
|
| |
for large values.
The NEON implementation mimics the SSE implementation, but didn't mention the caveat that due to the unsigned of signed integer conversions, not all values in the original floating point represented are supported.
|
| |
|
| |
|
|
|
|
|
| |
StlDeque extends std::deque by accessing some of its internal members.
Since GCC 10 these are not accessible anymore.
|
| |
|
|
|
|
| |
segfault when the argument is unaligned.
|
|
|
|
| |
See !172 for related discussions.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
- Fix docker Fedora image to Fedora:31
- Fix gcc version to gcc-9.2.1
- Use GitLab CI dag
- Fix usage of build cache
- Introduce build artificats
|
| |
|
|
|
|
|
|
|
| |
If we have explicit conversion operators available (C++11) we define
explicit casts from bfloat16 to other types. If not (C++03), we don't
define conversion operators but rely on implicit conversion chains from
bfloat16 over float to other types.
|
| |
|
|
|
|
| |
Add roundtrip tests for casting between bfloat16 and complex types.
|
|
|
|
| |
This fixes https://gitlab.com/libeigen/eigen/-/issues/1951
|
| |
|
|
|
|
|
|
| |
Specialized `bfloat16_impl::float_to_bfloat16_rtne(float)` for normal floating point numbers, infinity and zero, in order to improve the performance of `bfloat16::bfloat16(const T&)` for integer argument types.
A reduction of more than 20% of the runtime duration of conversion from int to bfloat16 was observed, using Visual C++ 2019 on Windows 10.
|
| |
|
| |
|
| |
|
| |
|