| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
| |
If no offset is given, them it should be zero.
Also passes full address to vec_vsx_ld/st builtins.
Removes userless _EIGEN_ALIGNED_PTR & _EIGEN_MASK_ALIGNMENT.
Removes unnecessary casts.
|
|
|
|
| |
Ignoring -Wc11-extensions warnings thrown by clang at Altivec/PacketMath.h
|
|
|
|
|
|
| |
each other.
Add specializations for complex types since std::log1p and std::exp1m do not support complex.
|
| |
|
|
|
|
| |
does not support this.
|
|
|
|
|
|
|
|
|
|
|
|
| |
blocks if the strides are known to be 1. Provides up to 20-25% speedup of the TF cross entropy op with AVX.
A few benchmark numbers:
name old time/op new time/op delta
BM_Xent_16_10000_cpu 448µs ± 3% 389µs ± 2% -13.21%
(p=0.008 n=5+5)
BM_Xent_32_10000_cpu 575µs ± 6% 454µs ± 3% -21.00% (p=0.008 n=5+5)
BM_Xent_64_10000_cpu 933µs ± 4% 712µs ± 1% -23.71% (p=0.008 n=5+5)
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://bitbucket.org/eigen/eigen/pull-requests/662.
The change caused the device struct to be copied for each expression evaluation, and caused, e.g., a 10% regression in the TensorFlow multinomial op on GPU:
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------------------
BM_Multinomial_gpu_1_100000_4 128173 231326 2922 1.610G items/s
VS
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------------------
BM_Multinomial_gpu_1_100000_4 146683 246914 2719 1.509G items/s
|
|
|
|
| |
intended to be part of the code.
|
| |
|
| |
|
|
|
|
| |
to make it actually appear in the generated documentation.
|
| |
|
|
|
|
| |
Also, document LinSpaced only where it is implemented
|
| |
|
| |
|
|
|
|
| |
runtime
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not having this attribute results in the following failures in the `--config=rocm` TF build.
```
In file included from tensorflow/core/kernels/cross_op_gpu.cu.cc:20:
In file included from ./tensorflow/core/framework/register_types.h:20:
In file included from ./tensorflow/core/framework/numeric_types.h:20:
In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1:
In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:140:
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error: 'Eigen::constCast': no overloaded function has restriction specifiers that are compatible with the ambient context 'data'
typename Storage::Type result = constCast(m_impl.data());
^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error: 'Eigen::constCast': no overloaded function has restriction specifiers that are compatible with the ambient context 'data'
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h:148:56: note: in instantiation of member function 'Eigen::TensorEvaluator<const Eigen::TensorChippingOp<1, Eigen::TensorMap<Eigen::Tensor<int, 2, 1, long>, 16, MakePointer> >, Eigen::Gpu\
Device>::data' requested here
return m_rightImpl.evalSubExprsIfNeeded(m_leftImpl.data());
```
Adding the EIGEN_DEVICE_FUNC attribute resolves those errors
|
|\
| |
| |
| |
| |
| |
| | |
[SYCL] :
Approved-by: Gael Guennebaud <g.gael@free.fr>
Approved-by: Rasmus Larsen <rmlarsen@google.com>
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
* Modifying TensorDeviceSYCL to use `EIGEN_THROW_X`.
* Modifying TensorMacro to use `EIGEN_TRY/CATCH(X)` macro.
* Modifying TensorReverse.h to use `EIGEN_DEVICE_REF` instead of `&`.
* Fixing the SYCL device macro in SpecialFunctionsImpl.h.
|
| | |
|
|/
|
|
|
| |
* an interface for SYCL buffers to behave as a non-dereferenceable pointer
* an interface for placeholder accessor to behave like a pointer on both host and device
|
| |
|
| |
|
|
|
|
| |
eigen::GpuDevice::synchronize() from device code, but not when calling from a non-GPU compilation unit.
|
| |
|
|\ |
|
| |
| |
| |
| | |
block access when preferred
|
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
module required to run it on devices supporting SYCL.
* Abstracting the pointer type so that both SYCL memory and pointer can be captured.
* Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class.
* Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node.
* Adding SYCL macro for controlling loop unrolling.
* Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
|
|/
|
|
|
|
|
|
|
|
| |
module required to run it on devices supporting SYCL.
* Abstracting the pointer type so that both SYCL memory and pointer can be captured.
* Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class.
* Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node.
* Adding SYCL macro for controlling loop unrolling.
* Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
|
|
|
|
|
|
|
|
| |
Eigen unsupported modules on devices supporting SYCL.
* Adding SYCL memory model
* Enabling/Disabling SYCL backend in Core
* Supporting Vectorization
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
clause.
|
| |
|
|
|
|
|
|
| |
1. Fix buggy pcmp_eq and unit test for half types.
2. Add unit test for pselect and add specializations for SSE 4.1, AVX512, and half types.
3. Get rid of FIXME: Implement faster pnegate for half by XOR'ing with a sign bit mask.
|
|
|
|
|
| |
(grafted from 427f2f66d69ae9b124c2f8bcd927fb6e19e07e91
)
|
|
|
|
| |
EIGEN_HAS_TYPE_TRAITS is off.
|
|
|
|
| |
clang.
|
|\
| |
| |
| |
| |
| | |
Minor build improvements
Approved-by: Rasmus Larsen <rmlarsen@google.com>
|
| |
| |
| |
| | |
CUDA build failures.
|
|/
|
|
|
|
|
|
| |
* Allow specifying multiple GPU architectures. E.g.:
cmake -DEIGEN_CUDA_COMPUTE_ARCH="60;70"
* Pass CUDA SDK path to clang. Without it it will default to /usr/local/cuda
which may not be the right location, if cmake was invoked with
-DCUDA_TOOLKIT_ROOT_DIR=/some/other/CUDA/path
|
|
|
|
| |
Problem reported on https://stackoverflow.com/questions/56395899
|
|\
| |
| |
| | |
fix for HIP build errors that were introduced by a commit earlier this week
|