| Commit message (Collapse) | Author | Age |
|
|
|
| |
Add a new EIGEN_HAS_INTRINSIC_INT128 macro, and use this instead of __SIZEOF_INT128__. This fixes related issues with TensorIntDiv.h when building with Clang for Windows, where support for 128-bit integer arithmetic is advertised but broken in practice.
|
|
|
|
|
|
|
|
|
|
| |
module required to run it on devices supporting SYCL.
* Abstracting the pointer type so that both SYCL memory and pointer can be captured.
* Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class.
* Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node.
* Adding SYCL macro for controlling loop unrolling.
* Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two major changes (and a few minor ones which are not listed here...see PR discussion for details)
1. Eigen::half implementations for HIP and CUDA have been merged.
This means that
- `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h`
- `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h`
- `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h`
After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install.
2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate.
- `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC || EIGEN_HIPCC)`
- `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH || EIGEN_HIP_DEVICE_COMPILE)`
- `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`
|
| |
|
| |
|
|
|
|
| |
aliases
|
|
|
|
| |
Tensor Contractsycl to be located in any place in the expression tree.
|
|
|
|
| |
TensorIntDiv.h; disabling cashsize for sycl in tensorDeviceDefault.h; adding sycl backend for StrideSliceOP ; removing sycl compiler warning for creating an array of size 0 in CXX11Meta.h; cleaning up the sycl backend code.
|
|
|
|
| |
tensor_broadcast_sycl on GPU; adding get_sycl_supported_devices() on syclDevice.h.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Also updated the code to silence bogux warnings generated by nvcc when compilining this function.
|
| |
|
|
|
|
| |
the divisor can be equal to one since it isn't supported.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
more than 4 billion elements.
|
|
|
|
| |
to be incorrectly handled
|
| |
|
| |
|
|
|
|
| |
integers. It saves 1 register on CPU and 2 on GPU.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
Sped up tensor slicing by a factor of 3 by using these fast integer divisions.
|