| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
| |
As written, depending on multithreading/gpu, the returned index from
`argmin`/`argmax` is not currently stable. Here we modify the functors
to always keep the first occurence (i.e. if the value is equal to the
current min/max, then keep the one with the smallest index).
This is otherwise causing unpredictable results in some TF tests.
|
|
|
|
|
|
|
| |
The previous balancer overflowed for large row/column norms.
Modified to prevent that.
Fixes #2273.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
When using Eigen for gpu, these simplify portability. If
`EIGEN_PERMANENTLY_ENABLE_GPU_HIP_CUDA_DEFINES` is set, then
we do not undefine them.
|
|
|
|
|
|
| |
Currently TF lite needs to hack around with the Tensor headers in order
to customize the contraction dispatch method. Here we add simple `#ifndef`
guards to allow them to provide their own dispatch prior to inclusion.
|
| |
|
|
|
|
|
|
|
|
| |
Made a class and singleton to encapsulate initialization and retrieval of
device properties.
Related to !481, which already changed the API to address a static
linkage issue.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Time-dependence prevents tests from being repeatable. This has long
been an issue with debugging the tensor tests. Removing this will allow
future tests to be repeatable in the usual way.
Also, the recently added macros in !476 are causing headaches across different
platforms. For example, checking `_XOPEN_SOURCE` is leading to multiple
ambiguous macro errors across Google, and `_DEFAULT_SOURCE`/`_SVID_SOURCE`/`_BSD_SOURCE`
are sometimes defined with values, sometimes defined as empty, and sometimes
not defined at all when they probably should be. This is leading to
multiple build breakages.
The simplest approach is to generate a seed via
`Eigen::internal::random<uint64_t>()` if on CPU. For GPU, we use a
hash based on the current thread ID (since `rand()` isn't supported
on GPU).
Fixes #1602.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
m_deviceProperties and m_devicePropInitialized are defined as global
statics which will define multiple copies which can cause issues if
initializeDeviceProp() is called in one translation unit and then
m_deviceProperties is used in a different translation unit. Added
inline functions getDeviceProperties() and getDevicePropInitialized()
which defines those variables as static locals. As per the C++ standard
7.1.2/4, a static local declared in an inline function always refers
to the same object, so this should be safer. Credit to Sun Chenggen
for this fix.
This fixes issue #1475.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`TensorRandom` currently relies on BSD `random()`, which is not always
available. The [linux manpage](https://man7.org/linux/man-pages/man3/srandom.3.html)
gives the glibc condition:
```
_XOPEN_SOURCE >= 500
|| /* Glibc since 2.19: */ _DEFAULT_SOURCE
|| /* Glibc <= 2.19: */ _SVID_SOURCE || _BSD_SOURCE
```
In particular, this was failing to compile for MinGW via msys2. If not
available, we fall back to using `rand()`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The namespace declaration for googlehash is a configurable macro that
can be disabled. In particular, it is disabled within google, causing
compile errors since `dense_hash_map`/`sparse_hash_map` are then in
the global namespace instead of in `::google`.
Here we play a bit of gynastics to allow for both `google::*_hash_map`
and `*_hash_map`, while limiting namespace polution. Symbols within
the `::google` namespace are imported into `Eigen::google`.
We also remove checks based on `_SPARSE_HASH_MAP_H_`, as this is
fragile, and instead require `EIGEN_GOOGLEHASH_SUPPORT` to be
defined.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
specialization of ScanLauncher.
The issue was discovered when the GPU scan unit test was run and resulted in a segmentation fault.
The segmantation fault occurred because the unit test allocated GPU memory and passed a pointer to that memory to the computation that it presumed would execute on the GPU.
But because of the issue, the computation was scheduled to execute on the CPU so a situation was constructed where the CPU attempted to access a GPU memory location.
The fix expands the GPU specific ScanLauncher specialization to handle cases where vectorization is enabled.
Previously, the GPU specialization is chosen only if Vectorization is not used.
|
|
|
|
|
|
| |
innerStride(), outerStride(), and size()""
This reverts commit 5f0b4a4010af4cbf6161a0d1a03a747addc44a5d.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original swap approach leads to potential undefined behavior (reading
uninitialized memory) and results in unnecessary copying of data for static
storage.
Here we pass down the move assignment to the underlying storage. Static
storage does a one-way copy, dynamic storage does a swap.
Modified the tests to no longer read from the moved-from matrix/tensor,
since that can lead to UB. Added a test to ensure we do not access
uninitialized memory in a move.
Fixes: #2119
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The macro `__cplusplus` is not defined correctly in MSVC unless building
with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the
specified c++ standard version number.
Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version
number both for MSVC and otherwise. This simplifies checks for supported
features.
Also replaced most instances of standard version checking via `__cplusplus`
with the existing `EIGEN_COMP_CXXVER` macro for better clarity.
Fixes: #2170
|
|
|
|
|
|
|
| |
innerStride(), outerStride(), and size()"
This reverts commit 6cbb3038ac48cb5fe17eba4dfbf26e3e798041f1 because it
breaks clang-10 builds on x86 and aarch64 when C++11 is enabled.
|
|
|
|
| |
outerStride(), and size()
|
| |
|
|
|
|
|
|
| |
warnings
(cherry picked from commit 9bbb7ea4b54b1f307863be4ed8d105c38cdefe50)
|
|
|
|
| |
(cherry picked from commit 838f3d8ce22a5549ef10c7386fb03040721749a0)
|
|
|
|
| |
(cherry picked from commit abbf95045009619f37bd92b45433eedbfcbe41cf)
|
|
|
|
| |
(cherry picked from commit c22c103e932e511e96645186831363585a44b7a3)
|
| |
|
| |
|
|
|
|
|
|
| |
1.Only computing about half of the factors and use complex conjugate symmetry for the rest instead of all to save time.
2.All twiddles are calculated in double because that gives the maximum achievable precision when doing float transforms.
3.Reducing all angles to the range 0<angle<pi/4 which gives even more precision.
|
|
|
|
|
|
|
| |
Also modified cmake/FindAdolc.cmake to eliminate warnings, and added
search paths to match install layout.
Fixed: #2157
|
| |
|
|
|
|
| |
As discussed in #2143 we remove editor specific comments.
|
|
|
|
| |
test adjoint and transpose solves
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originating from
[this SO issue](https://stackoverflow.com/questions/65901014/how-to-solve-this-all-error-2-in-this-case),
some win32 compilers define `__int32` as a `long`, but MinGW defines
`std::int32_t` as an `int`, leading to a type conflict.
To avoid this, we remove the custom `typedef` definitions for win32. The
Tensor module requires C++11 anyways, so we are guaranteed to have
included `<cstdint>` already in `Eigen/Core`.
Also re-arranged the headers to only include `<cstdint>` in one place to
avoid this type of error again.
|
|
|
|
| |
This fixes #2123
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is to support scalar `sqrt` of complex numbers `std::complex<T>` on
device, requested by Tensorflow folks.
Technically `std::complex` is not supported by NVCC on device
(though it is by clang), so the default `sqrt(std::complex<T>)` function only
works on the host. Here we create an overload to add back the
functionality.
Also modified the CMake file to add `--relaxed-constexpr` (or
equivalent) flag for NVCC to allow calling constexpr functions from
device functions, and added support for specifying compute architecture for
NVCC (was already available for clang).
|
|
|
|
| |
FixedDimensions.
|
|
|
|
|
|
|
| |
Removed m_dimension as instance member of TensorStorage with
FixedDimensions and instead use the template parameter. This
means that the sizeof a pure fixed-size storage is exactly
equal to the data it is storing.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current implementations fail to consider half-float packets, only
half-float scalars. Added specializations for packets on AVX, AVX512 and
NEON. Added tests to `special_packetmath`.
The current `special_functions` tests would fail for half and bfloat16 due to
lack of precision. The NEON tests also fail with precision issues and
due to different handling of `sqrt(inf)`, so special functions bessel, ndtri
have been disabled.
Tested with AVX, AVX512.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The existing `TensorRandom.h` implementation makes the assumption that
`half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not
always true. This currently fails on arm64, where `x` has type `__fp16`.
Added `bit_cast` specializations to allow casting to/from `uint16_t`
for both `half` and `bfloat16`. Also added tests in
`half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch
these errors in the future.
|
|
|
|
|
|
|
| |
Adds copy constructors to Tensor ops, inherits assignment operators from
`TensorBase`.
Addresses #1863
|
|
|
|
| |
causing issues in embeded systems
|
|
|
|
| |
broken by c6953f799b01d36f4236b64f351cc1446e0abe17.
|
|
|
|
| |
`predux_fmax_nan` that implement reductions with `PropagateNaN`, and `PropagateNumbers` semantics. Add (slow) generic implementations for most reductions.
|
| |
|
| |
|