aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core/arch
Commit message (Collapse)AuthorAge
* Enable partial support for half floats on Kepler GPUs.Gravatar Benoit Steiner2016-03-03
|
* Enable the conversion between floats and half floats on older GPUs that ↵Gravatar Benoit Steiner2016-03-03
| | | | support it.
* Declare the half float type as arithmetic.Gravatar Benoit Steiner2016-02-22
|
* Implemented the ptranspose function on half floatsGravatar Benoit Steiner2016-02-21
|
* Added the ability to compute the absolute value of a half floatGravatar Benoit Steiner2016-02-21
|
* Moved some of the fp16 operators outside the Eigen namespace to workaround ↵Gravatar Benoit Steiner2016-02-20
| | | | some nvcc limitations.
* Added support for tensor reductions on half floatsGravatar Benoit Steiner2016-02-19
|
* Implemented the scalar division of 2 half floatsGravatar Benoit Steiner2016-02-19
|
* Added support for operators +=, -=, *= and /= on CUDA half floatsGravatar Benoit Steiner2016-02-19
|
* Implemented protate() for CUDAGravatar Benoit Steiner2016-02-19
|
* Added support for simple coefficient wise tensor expression using half ↵Gravatar Benoit Steiner2016-02-19
| | | | floats on CUDA devices
* FP16 on CUDA are only available starting with cuda 7.5. Disable them when ↵Gravatar Benoit Steiner2016-02-18
| | | | using an older version of CUDA
* Added preliminary support for half floats on CUDA GPU. For now we can simply ↵Gravatar Benoit Steiner2016-02-19
| | | | convert floats into half floats and vice versa
* Improved implementation of ptanh for SSE and AVXGravatar Benoit Steiner2016-02-18
|
* Avoid implicit cast from double to float.Gravatar Benoit Steiner2016-02-10
|
* Optimized implementation of the tanh function for SSEGravatar Benoit Steiner2016-02-10
|
* Optimized implementation of the hyperbolic tangent function for AVXGravatar Benoit Steiner2016-02-10
|
* Make the GCC workaround for sqrt GCC-only; detect Emscripten as non-GCCGravatar Benoit Jacob2016-02-10
|
* Work around Emscripten bug - https://github.com/kripken/emscripten/issues/4088Gravatar Benoit Jacob2016-02-10
|
* Remove custom unaligned loads for SSE. They were only useful for core2 CPU.Gravatar Gael Guennebaud2016-02-08
|
* mergeGravatar Gael Guennebaud2016-01-28
|\
* | Fix compilation on old gcc+AVXGravatar Gael Guennebaud2016-01-21
| |
* | Add numext::sqrt function to enable custom optimized implementation.Gravatar Gael Guennebaud2016-01-21
| | | | | | | | | | | | | | | | This changeset add two specializations for float/double on SSE. Those are mostly usefull with GCC for which std::sqrt add an extra and costly check on the result of _mm_sqrt_*. Clang does not add this burden. In this changeset, only DenseBase::norm() makes use of it.
* | Workaround clang -Wdocumentation warning about "/*<"Gravatar Gael Guennebaud2015-12-30
| |
| * Merged eigen/eigen into defaultGravatar Eugene Brevdo2015-12-24
| |\ | |/ |/|
| * Add digamma for CPU + CUDA. Includes tests.Gravatar Eugene Brevdo2015-12-24
| |
* | Workaround compilers that do not even define _mm256_set_m128.Gravatar Gael Guennebaud2015-12-24
|/
* Fixed a typo in previous change.Gravatar Benoit Steiner2015-12-21
|
* Added support for CUDA architectures that don's support for 3.5 capabilitiesGravatar Benoit Steiner2015-12-21
|
* Fixed a typo.Gravatar Benoit Steiner2015-12-18
|
* bug #1140: remove custom definition and use of _mm256_setr_m128Gravatar Gael Guennebaud2015-12-18
|
* Merged in ebrevdo/eigen (pull request PR-148)Gravatar Gael Guennebaud2015-12-11
|\ | | | | | | Add special functions to eigen: lgamma, erf, erfc.
* | bug #1103: fix neon vectorization of pmul(Packet1cd,Packet1cd)Gravatar Gael Guennebaud2015-12-10
| |
| * Add special functions to Eigen: lgamma, erf, erfc.Gravatar Eugene Brevdo2015-12-07
|/ | | | Includes CUDA support and unit tests.
* Fix "," in non SSE4 modeGravatar Gael Guennebaud2015-11-05
|
* Fix AVX round/ceil/floor, and fix respective unit testGravatar Gael Guennebaud2015-11-04
|
* Merged in aavenel/eigen (pull request PR-142)Gravatar Gael Guennebaud2015-11-04
|\ | | | | | | Add round, ceil and floor for SSE4.1/AVX (Bug #70)
* | Made the CUDA implementation of ploadt_ro compatible with cuda ↵Gravatar Benoit Steiner2015-11-03
| | | | | | | | implementations older than 3.5
| * Add round, ceil and floor for SSE4.1/AVX (Bug #70)Gravatar Alexandre Avenel2015-11-01
|/
* bug #1085: workaround gcc default ABI issueGravatar Gael Guennebaud2015-10-10
|
* _mm_hadd_epi32 is for SSSE3 only (and not SSE3)Gravatar Gael Guennebaud2015-10-07
|
* Handle various TODOs in SSE vectorization (remove splitted storeu, enable ↵Gravatar Gael Guennebaud2015-10-06
| | | | SSE3 integer vectorization, plus minor tweaks)
* bug #1069: fix AVX support on MSVC (use of non portable C-style cast)Gravatar Gael Guennebaud2015-09-28
|
* Added support for predux_mul for CUDA devicesGravatar Benoit Steiner2015-09-08
|
* Implement plog and pexp for AltiVec.Gravatar Doug Kwan2015-07-30
|
* Fix prototype of plset and generalize linspace functor.Gravatar Gael Guennebaud2015-08-07
|
* Include SSE packetmath when AVX is enabled, and enable AVX's sine function ↵Gravatar Gael Guennebaud2015-08-07
| | | | only in fast-math mode (as SSE)
* Let unpacket_traits<> exposes the required alignment and make use of it ↵Gravatar Gael Guennebaud2015-08-07
| | | | everywhere
* Fix shadow warnings triggered by clangGravatar Gael Guennebaud2015-06-09
|
* Abandon blocking size lookup table approach. Not performing as well in real ↵Gravatar Benoit Jacob2015-05-19
| | | | world as in microbenchmark.