| Commit message (Collapse) | Author | Age |
| |
|
| |
|
| |
|
|
|
|
|
|
| |
- Guard fundamental types that are not available pre C++11
- Separate subsequent angle brackets >> by spaces
- Allow casting of Eigen::half and Eigen::bfloat16 to complex types
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original tensor casts were only defined for
`SrcCoeffRatio`:`TgtCoeffRatio` 1:1, 1:2, 2:1, 4:1. Here we add the
missing 1:N and 8:1.
We also add casting `Eigen::half` to/from `std::complex<T>`, which
was missing to make it consistent with `Eigen:bfloat16`, and
generalize the overload to work for any complex type.
Tests were added to `basicstuff`, `packetmath`, and
`cxx11_tensor_casts` to test all cast configurations.
|
|
|
|
|
| |
`float_denorm_style` is an old-style `enum`, so the `denorm_present`
symbol only exists in the `std` namespace prior to c++11.
|
| |
|
| |
|
|
|
|
|
| |
Otherwise the Make (or Ninja) program is used, which is
installed system wide.
|
|
|
|
|
| |
The initial CI configuration consists of jobs to build and run tests and
to build docs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added missing `pmadd<Packet2f>` for NEON. This leads to significant
improvement in precision than previous `pmul+padd`, which was causing
the `pcos` tests to fail. Also added an approx test with
`std::sin`/`std::cos` since otherwise returning any `a^2+b^2=1` would
pass.
Modified `log(denorm)` tests. Denorms are not always supported by all
systems (returns `::min`), are always flushed to zero on 32-bit arm,
and configurably flush to zero on sse/avx/aarch64. This leads to
inconsistent results across different systems (i.e. `-inf` vs `nan`).
Added a check for existence and exclude ARM.
Removed logistic exactness test, since scalar and vectorized versions
follow different code-paths due to differences in `pexp` and `pmadd`,
which result in slightly different values. For example, exactness always
fails on arm, aarch64, and altivec.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current multiply (`pmul`) and comparison operators (`pcmp_lt`,
`pcmp_le`, `pcmp_eq`) are missing for packets `Packet2l` and
`Packet2ul`. This leads to compile errors for the `packetmath.cpp` tests
in clang. Here we add and test the missing ops.
Tested:
```
$ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
$ arm-linux-gnueabihf-g++ -mfpu=neon -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
$ clang++ -target aarch64-linux-android21 -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
$ clang++ -target armv7-linux-android21 -static -mfpu=neon -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The NEON `pcast` operators are all implemented and tested for existing
packets. This requires adding a `pcast(a,b,c,d,e,f,g,h)` for casting
between `int64_t` and `int8_t` in `GenericPacketMath.h`.
Removed incorrect `HasHalfPacket` definition for NEON's
`Packet2l`/`Packet2ul`.
Adjustments were also made to the `packetmath` tests. These include
- minor bug fixes for cast tests (i.e. 4:1 casts, only casting for
packets that are vectorizable)
- added 8:1 cast tests
- random number generation
- original had uninteresting 0 to 0 casts for many casts between
floating-point and integers, and exhibited signed overflow
undefined behavior
Tested:
```
$ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_ALL=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
```
|
| |
|
| |
|
|
|
|
| |
Print cmake commands instead of make commands, which should work for any generator.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Running two chains exposes more instruction level parallelism,
by allowing to execute both chains at the same time.
Results are a bit noisy, but for medium length we almost hit
theoretical upper bound of 2x.
BM_fullReduction_16T/3 [using 16 threads] 17.3ns ±11% 17.4ns ± 9% ~ (p=0.178 n=18+19)
BM_fullReduction_16T/4 [using 16 threads] 17.6ns ±17% 17.0ns ±18% ~ (p=0.835 n=20+19)
BM_fullReduction_16T/7 [using 16 threads] 18.9ns ±12% 18.2ns ±10% ~ (p=0.756 n=20+18)
BM_fullReduction_16T/8 [using 16 threads] 19.8ns ±13% 19.4ns ±21% ~ (p=0.512 n=20+20)
BM_fullReduction_16T/10 [using 16 threads] 23.5ns ±15% 20.8ns ±24% -11.37% (p=0.000 n=20+19)
BM_fullReduction_16T/15 [using 16 threads] 35.8ns ±21% 26.9ns ±17% -24.76% (p=0.000 n=20+19)
BM_fullReduction_16T/16 [using 16 threads] 38.7ns ±22% 27.7ns ±18% -28.40% (p=0.000 n=20+19)
BM_fullReduction_16T/31 [using 16 threads] 146ns ±17% 74ns ±11% -49.05% (p=0.000 n=20+18)
BM_fullReduction_16T/32 [using 16 threads] 154ns ±19% 84ns ±30% -45.79% (p=0.000 n=20+19)
BM_fullReduction_16T/64 [using 16 threads] 603ns ± 8% 308ns ±12% -48.94% (p=0.000 n=17+17)
BM_fullReduction_16T/128 [using 16 threads] 2.44µs ±13% 1.22µs ± 1% -50.29% (p=0.000 n=17+17)
BM_fullReduction_16T/256 [using 16 threads] 9.84µs ±14% 5.13µs ±30% -47.82% (p=0.000 n=19+19)
BM_fullReduction_16T/512 [using 16 threads] 78.0µs ± 9% 56.1µs ±17% -28.02% (p=0.000 n=18+20)
BM_fullReduction_16T/1k [using 16 threads] 325µs ± 5% 263µs ± 4% -19.00% (p=0.000 n=20+16)
BM_fullReduction_16T/2k [using 16 threads] 1.09ms ± 3% 0.99ms ± 1% -9.04% (p=0.000 n=20+20)
BM_fullReduction_16T/4k [using 16 threads] 7.66ms ± 3% 7.57ms ± 3% -1.24% (p=0.017 n=20+20)
BM_fullReduction_16T/10k [using 16 threads] 65.3ms ± 4% 65.0ms ± 3% ~ (p=0.718 n=20+20)
|
| |
|
| |
|
|
|
|
|
|
| |
Now this compiles without errors:
$ clang++ -I ../../ test_sparseLU.cpp -std=c++03
|
|
|
|
|
|
|
|
|
|
|
| |
$ clang++ -O3 bench/bench_move_semantics.cpp -I. -std=c++11 \
-o bench_move_semantics
$ ./bench_move_semantics
float copy semantics: 1755.97 ms
float move semantics: 55.063 ms
double copy semantics: 2457.65 ms
double move semantics: 55.034 ms
|
|
|
|
|
|
|
|
|
|
|
| |
The use of the `packet_traits<>::HasCast` field is currently inconsistent with
`type_casting_traits<>`, and is unused apart from within
`test/packetmath.cpp`. In addition, those packetmath cast tests do not
currently reflect how casts are performed in practice: they ignore the
`SrcCoeffRatio` and `TgtCoeffRatio` fields, assuming a 1:1 ratio.
Here we remove the unsed `HasCast`, and modify the packet cast tests to
better reflect their usage.
|
| |
|
| |
|
|
|
|
| |
Fix compiler warnings in GeneralBlockPanelKernel.h.
|
|
|
|
|
| |
- Use standard types in SYCL/PacketMath.h to avoid compilation problems on Windows
- Add EIGEN_HAS_CONSTEXPR to cxx11_tensor_argmax_sycl.cpp to fix build problems on Windows
|
|
|
|
|
| |
This reverts commit 95177362edc9c814a102c8a2236695c632892232 to
disable GitLab CI temporarily.
|
| |
|
| |
|
|
|
|
| |
numext::not_equal_strict to avoid breaking builds that compile with -Werror=float-equal.
|
| |
|
| |
|
|
|
|
| |
function + respective unit test
|
| |
|
| |
|
| |
|
|
|
|
| |
sparse matrix
|
| |
|
|
|
|
| |
ptranspose on NEON
|
|
|
|
|
|
|
| |
Both i386 and 32-bit ARM do not define __uint128_t. On most systems, if
__uint128_t is defined, then so is the macro __SIZEOF_INT128__.
https://stackoverflow.com/questions/18531782/how-to-know-if-uint128-t-is-defined1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR tries to fix an incorrect usage of `if defined(EIGEN_ARCH_PPC)`
in `Eigen/Core` header.
In `Eigen/src/Core/util/Macros.h`, EIGEN_ARCH_PPC was explicitly defined
as either 0 or 1. As a result `if defined(EIGEN_ARCH_PPC)` will always be true.
This causes issues when building on non PPC platform and `MatrixProduct.h` is not
available.
This fix changes `if defined(EIGEN_ARCH_PPC)` => `if EIGEN_ARCH_PPC`.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
|
| |
|
| |
|
|
|
|
| |
This fixes https://gitlab.com/libeigen/eigen/-/issues/1897
|