aboutsummaryrefslogtreecommitdiffhomepage
Commit message (Collapse)AuthorAge
* Revert accidentally removed <memory> header from ThreadPoolGravatar Eugene Zhulenev2019-08-30
|
* Asynchronous expression evaluation with TensorAsyncDeviceGravatar Eugene Zhulenev2019-08-30
|
* Fix missing header inclusion and colliding definitions for half type ↵Gravatar Rasmus Munk Larsen2019-08-30
| | | | | | casting, which broke build with -march=native on Haswell/Skylake.
* Const correctness in TensorMap<const Tensor<T, ...>> expressionsGravatar Eugene Zhulenev2019-08-28
|
* Add more tests for corner cases of log1p and expm1. Add handling of infinite ↵Gravatar Rasmus Munk Larsen2019-08-28
| | | | arguments to log1p such that log1p(inf) = inf.
* Remove shadow warnings in TensorDeviceThreadPoolGravatar Eugene Zhulenev2019-08-28
|
* Revert changes to std_falback::log1p that broke handling of arguments less ↵Gravatar Rasmus Munk Larsen2019-08-27
| | | | than -1. Fix packet op accordingly.
* Clean up float16 a.k.a. Eigen::half support in Eigen. Move the definition of ↵Gravatar Rasmus Munk Larsen2019-08-27
| | | | half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories.
* Merged in ezhulenev/eigen-01 (pull request PR-683)Gravatar Rasmus Larsen2019-08-26
|\ | | | | | | Asynchronous parallelFor in Eigen ThreadPoolDevice
* | Fix get_random_seed on Native ClientGravatar maratek2019-08-23
| | | | | | | | | | Newlib in Native Client SDK does not provide ::random function. Implement get_random_seed for NaCl using ::rand, similarly to Windows version.
| * Asynchronous parallelFor in Eigen ThreadPoolDeviceGravatar Eugene Zhulenev2019-08-22
|/
* Merged in jaopaulolc/eigen (pull request PR-679)Gravatar Christoph Hertzberg2019-08-22
|\ | | | | | | Fixes for Altivec/VSX and compilation with clang on PowerPC
* \ Merged in rmlarsen/eigen (pull request PR-680)Gravatar Rasmus Larsen2019-08-22
|\ \ | | | | | | | | | Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments.
* | | Remove XSMM support from Tensor moduleGravatar Eugene Zhulenev2019-08-19
| | |
| | * Fix debug macros in p{load,store}uGravatar João P. L. de Carvalho2019-08-14
| | |
| | * Add missing pcmp_XX methods for double/Packet2dGravatar João P. L. de Carvalho2019-08-14
| | | | | | | | | | | | This actually fixes an issue in unit-test packetmath_2 with pcmp_eq when it is compiled with clang. When pcmp_eq(Packet4f,Packet4f) is used instead of pcmp_eq(Packet2d,Packet2d), the unit-test does not pass due to NaN on ref vector.
| * | Implement vectorized versions of log1p and expm1 in Eigen using Kahan's ↵Gravatar Rasmus Munk Larsen2019-08-12
|/ / | | | | | | | | | | | | | | | | | | | | | | formulas, and change the scalar implementations to properly handle infinite arguments. Depending on instruction set, significant speedups are observed for the vectorized path: log1p wall time is reduced 60-93% (2.5x - 15x speedup) expm1 wall time is reduced 0-85% (1x - 7x speedup) The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly. Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM
| * Fix packed load/store for PowerPC's VSXGravatar João P. L. de Carvalho2019-08-09
| | | | | | | | | | | | | | | | The vec_vsx_ld/vec_vsx_st builtins were wrongly used for aligned load/store. In fact, they perform unaligned memory access and, even when the address is 16-byte aligned, they are much slower (at least 2x) than their aligned counterparts. For double/Packet2d vec_xl/vec_xst should be prefered over vec_ld/vec_st, although the latter works when casted to float/Packet4f. Silencing some weird warning with throw but some GCC versions. Such warning are not thrown by Clang.
| * Fix offset argument of ploadu/pstoreu for AltivecGravatar João P. L. de Carvalho2019-08-09
| | | | | | | | | | | | | | | | | | | | If no offset is given, them it should be zero. Also passes full address to vec_vsx_ld/st builtins. Removes userless _EIGEN_ALIGNED_PTR & _EIGEN_MASK_ALIGNMENT. Removes unnecessary casts.
| * bug #1718: Add cast to successfully compile with clang on PowerPCGravatar João P. L. de Carvalho2019-08-09
|/ | | | Ignoring -Wc11-extensions warnings thrown by clang at Altivec/PacketMath.h
* Fix bugs in log1p and expm1 where repeated using statements would clobber ↵Gravatar Rasmus Munk Larsen2019-08-08
| | | | | | each other. Add specializations for complex types since std::log1p and std::exp1m do not support complex.
* Guard against repeated definition of EIGEN_MPL2_ONLYGravatar Rasmus Munk Larsen2019-08-07
|
* Disable tests for contraction with output kernels when using libxsmm, which ↵Gravatar Rasmus Munk Larsen2019-08-07
| | | | does not support this.
* [Eigen] Vectorize evaluation of coefficient-wise functions over tensor ↵Gravatar Rasmus Munk Larsen2019-08-07
| | | | | | | | | | | | blocks if the strides are known to be 1. Provides up to 20-25% speedup of the TF cross entropy op with AVX. A few benchmark numbers: name old time/op new time/op delta BM_Xent_16_10000_cpu 448µs ± 3% 389µs ± 2% -13.21% (p=0.008 n=5+5) BM_Xent_32_10000_cpu 575µs ± 6% 454µs ± 3% -21.00% (p=0.008 n=5+5) BM_Xent_64_10000_cpu 933µs ± 4% 712µs ± 1% -23.71% (p=0.008 n=5+5)
* Clean up unnecessary namespace specifiers in TensorBlock.h.Gravatar Rasmus Munk Larsen2019-08-07
|
* Fix doc regarding alignment and c++17Gravatar Gael Guennebaud2019-08-04
|
* Fix performance regressions due to ↵Gravatar Rasmus Munk Larsen2019-08-02
| | | | | | | | | | | | | | | | | https://bitbucket.org/eigen/eigen/pull-requests/662. The change caused the device struct to be copied for each expression evaluation, and caused, e.g., a 10% regression in the TensorFlow multinomial op on GPU: Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------------------------------- BM_Multinomial_gpu_1_100000_4 128173 231326 2922 1.610G items/s VS Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------------------------------- BM_Multinomial_gpu_1_100000_4 146683 246914 2719 1.509G items/s
* Added leading asterisk for Doxygen to consume as it was removing asterisk ↵Gravatar Kyle Vedder2019-07-18
| | | | intended to be part of the code.
* Fix typo in Umeyama method documentationGravatar Michael Grupp2019-07-17
|
* Remove {} accidentally added in previous commitGravatar Christoph Hertzberg2019-07-18
|
* Move variadic constructors outside `#ifndef EIGEN_PARSED_BY_DOXYGEN` block, ↵Gravatar Christoph Hertzberg2019-07-12
| | | | to make it actually appear in the generated documentation.
* Escape \# inside doxygen docuGravatar Christoph Hertzberg2019-07-12
|
* Build deprecated snippets with -DEIGEN_NO_DEPRECATED_WARNINGGravatar Christoph Hertzberg2019-07-12
| | | | Also, document LinSpaced only where it is implemented
* Fix expression evaluation heuristic for TensorSliceOpGravatar Eugene Zhulenev2019-07-09
|
* Fix compiler for unsigned integers.Gravatar Rasmus Munk Larsen2019-07-09
|
* Add outer/inner chipping optimization for chipping dimension specified at ↵Gravatar Eugene Zhulenev2019-07-03
| | | | runtime
* adding the EIGEN_DEVICE_FUNC attribute to the constCast routine.Gravatar Deven Desai2019-07-02
| | | | | | | | | | | | | | | | | | | | | | Not having this attribute results in the following failures in the `--config=rocm` TF build. ``` In file included from tensorflow/core/kernels/cross_op_gpu.cu.cc:20: In file included from ./tensorflow/core/framework/register_types.h:20: In file included from ./tensorflow/core/framework/numeric_types.h:20: In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1: In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:140: external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error: 'Eigen::constCast': no overloaded function has restriction specifiers that are compatible with the ambient context 'data' typename Storage::Type result = constCast(m_impl.data()); ^ external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error: 'Eigen::constCast': no overloaded function has restriction specifiers that are compatible with the ambient context 'data' external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h:148:56: note: in instantiation of member function 'Eigen::TensorEvaluator<const Eigen::TensorChippingOp<1, Eigen::TensorMap<Eigen::Tensor<int, 2, 1, long>, 16, MakePointer> >, Eigen::Gpu\ Device>::data' requested here return m_rightImpl.evalSubExprsIfNeeded(m_leftImpl.data()); ``` Adding the EIGEN_DEVICE_FUNC attribute resolves those errors
* Merged in codeplaysoftware/eigen (pull request PR-667)Gravatar Gael Guennebaud2019-07-02
|\ | | | | | | | | | | | | [SYCL] : Approved-by: Gael Guennebaud <g.gael@free.fr> Approved-by: Rasmus Larsen <rmlarsen@google.com>
* | Allocate non-const scalar buffer for block evaluation with DefaultDeviceGravatar Eugene Zhulenev2019-07-01
| |
| * [SYCL] :Gravatar Mehdi Goli2019-07-01
| | | | | | | | | | | | | | * Modifying TensorDeviceSYCL to use `EIGEN_THROW_X`. * Modifying TensorMacro to use `EIGEN_TRY/CATCH(X)` macro. * Modifying TensorReverse.h to use `EIGEN_DEVICE_REF` instead of `&`. * Fixing the SYCL device macro in SpecialFunctionsImpl.h.
* | PR 655: Fix missing Eigen namespace in MacrosGravatar Justin Carpentier2019-06-05
| |
* | [SYCL] Adding the SYCL memory model. The SYCL memory model provides :Gravatar Mehdi Goli2019-07-01
|/ | | | | * an interface for SYCL buffers to behave as a non-dereferenceable pointer * an interface for placeholder accessor to behave like a pointer on both host and device
* Fix TensorReverse on GPU with m_stride[i]==0Gravatar Eugene Zhulenev2019-06-28
|
* Fix CUDA compilation error for pselect<half>.Gravatar Rasmus Munk Larsen2019-06-28
|
* Fix preprocessor condition to only generate a warning when calling ↵Gravatar Rasmus Munk Larsen2019-06-28
| | | | eigen::GpuDevice::synchronize() from device code, but not when calling from a non-GPU compilation unit.
* Remove comma causing warning in c++03 mode.Gravatar Rasmus Munk Larsen2019-06-28
|
* Merge with Eigen headGravatar Eugene Zhulenev2019-06-28
|\
* | Add block access to TensorReverseOp and make sure that TensorForcedEval uses ↵Gravatar Eugene Zhulenev2019-06-28
| | | | | | | | block access when preferred
| * [SYCL] This PR adds the minimum modifications to the Eigen unsupported ↵Gravatar Rasmus Munk Larsen2019-06-28
|/| | | | | | | | | | | | | | | | | | | module required to run it on devices supporting SYCL. * Abstracting the pointer type so that both SYCL memory and pointer can be captured. * Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class. * Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node. * Adding SYCL macro for controlling loop unrolling. * Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
| * [SYCL] This PR adds the minimum modifications to the Eigen unsupported ↵Gravatar Mehdi Goli2019-06-28
|/ | | | | | | | | | module required to run it on devices supporting SYCL. * Abstracting the pointer type so that both SYCL memory and pointer can be captured. * Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class. * Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node. * Adding SYCL macro for controlling loop unrolling. * Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.