Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Undo changes in AltiVec --- I don't have any way to test there. | 2016-06-28 | |
| | |||
* | Avoid global variables with static constructors in NEON/Complex.h | 2016-06-28 | |
| | |||
* | bug #1240: Remove any assumption on NEON vector types. | 2016-06-09 | |
| | |||
* | Fix compile errors initializing packets on ARM DS-5 5.20 | 2016-06-03 | |
| | | | | | | | | | | | | | | | | | | | | | The ARM DS-5 5.20 compiler fails compiling with the following errors: "src/Core/arch/NEON/PacketMath.h", line 113: Error: #146: too many initializer values Packet4f countdown = EIGEN_INIT_NEON_PACKET4(0, 1, 2, 3); ^ "src/Core/arch/NEON/PacketMath.h", line 118: Error: #146: too many initializer values Packet4i countdown = EIGEN_INIT_NEON_PACKET4(0, 1, 2, 3); ^ "src/Core/arch/NEON/Complex.h", line 30: Error: #146: too many initializer values static uint32x4_t p4ui_CONJ_XOR = EIGEN_INIT_NEON_PACKET4(0x00000000, 0x80000000, 0x00000000, 0x80000000); ^ "src/Core/arch/NEON/Complex.h", line 31: Error: #146: too many initializer values static uint32x2_t p2ui_CONJ_XOR = EIGEN_INIT_NEON_PACKET2(0x00000000, 0x80000000); ^ The vectors are implemented as two doubles, hence the too many initializer values error. Changed the code to use intrinsic load functions which all compilers implementing NEON should have. | ||
* | Enable the vectorization of adds and mults of fp16 | 2016-06-07 | |
| | |||
* | Add TernaryFunctors and the betainc SpecialFunction. | 2016-06-02 | |
| | | | | | | | | | | | | | | | | | | | TernaryFunctors and their executors allow operations on 3-tuples of inputs. API fully implemented for Arrays and Tensors based on binary functors. Ported the cephes betainc function (regularized incomplete beta integral) to Eigen, with support for CPU and GPU, floats, doubles, and half types. Added unit tests in array.cpp and cxx11_tensor_cuda.cu Collapsed revision * Merged helper methods for betainc across floats and doubles. * Added TensorGlobalFunctions with betainc(). Removed betainc() from TensorBase. * Clean up CwiseTernaryOp checks, change igamma_helper to cephes_helper. * betainc: merge incbcf and incbd into incbeta_cfe. and more cleanup. * Update TernaryOp and SpecialFunctions (betainc) based on review comments. | ||
* | Improved support for CUDA 8.0 | 2016-05-31 | |
| | |||
* | Disable the use of MMX instructions since the code is broken on many platforms | 2016-05-27 | |
| | |||
* | Deleted extra namespace | 2016-05-26 | |
| | |||
* | Disable usage of MMX with msvc. | 2016-05-26 | |
| | |||
* | Add missing inclusion of mmintrin.h | 2016-05-26 | |
| | |||
* | Silenced a compilation warning | 2016-05-25 | |
| | |||
* | Specify the rounding mode in the correct location | 2016-05-25 | |
| | |||
* | Explicitly specify the rounding mode when converting floats to fp16 | 2016-05-25 | |
| | |||
* | Disable the use of MMX instructions on x86_64 since too many compilers only ↵ | 2016-05-25 | |
| | | | | support them in 32bit mode | ||
* | Fix compilation with ICC. | 2016-05-25 | |
| | |||
* | Cleaned up the fp16 code a little more | 2016-05-24 | |
| | |||
* | Cleaned up the fp16 code | 2016-05-24 | |
| | |||
* | Remove now-unused protate PacketMath func | 2016-05-24 | |
| | |||
* | Don't attempt to use MMX instructions with visualstudio since they're only ↵ | 2016-05-24 | |
| | | | | partially supported. | ||
* | Worked around missing clang intrinsic | 2016-05-24 | |
| | |||
* | Use the generic ploadquad intrinsics since it does the job | 2016-05-24 | |
| | |||
* | Worked around missing clang intrinsics | 2016-05-24 | |
| | |||
* | Added missing EIGEN_DEVICE_FUNC qualifier | 2016-05-23 | |
| | |||
* | Use the Index type instead of integers to specify the strides in ↵ | 2016-05-23 | |
| | | | | pgather/pscatter | ||
* | Added missing alignment in the fp16 packet traits | 2016-05-23 | |
| | |||
* | ptranspose is not a template. | 2016-05-23 | |
| | |||
* | Avoid unnecessary float to double conversion. | 2016-05-23 | |
| | |||
* | Started to vectorize the processing of 16bit floats on CPU. | 2016-05-23 | |
| | |||
* | Replace multiple constructors of half-type by a generic/templated ↵ | 2016-05-23 | |
| | | | | constructor. This fixes an incompatibility with long double, exposed by the previous commit. | ||
* | Make EIGEN_HAS_C99_MATH user configurable | 2016-05-20 | |
| | |||
* | Fixed a couple of bugs related to the Pascalfamily of GPUs | 2016-05-11 | |
| | | | | H: Enter commit message. Lines beginning with 'HG:' are removed. | ||
* | Added the ability to load fp16 using the texture path. | 2016-05-11 | |
| | | | | Improved the performance of some reductions on fp16 | ||
* | Misc fixes for fp16 | 2016-05-11 | |
| | |||
* | Made predux_min and predux_max on fp16 less noisy | 2016-05-11 | |
| | |||
* | __ldg is only available with cuda architectures >= 3.5 | 2016-05-11 | |
| | |||
* | Fixed a typo | 2016-05-11 | |
| | |||
* | Added missing EIGEN_DEVICE_FUNC | 2016-05-11 | |
|\ | |||
| * | Added missing EIGEN_DEVICE_FUNC qualifiers | 2016-05-11 | |
|/ | |||
* | Added packet primitives to compute exp, log, sqrt and rsqrt on fp16. This ↵ | 2016-05-10 | |
| | | | | improves the performance by 10 to 30%. | ||
* | Added support for packet processing of fp16 on kepler and maxwell gpus | 2016-05-06 | |
| | |||
* | Relaxed the dummy precision for fp16 | 2016-05-05 | |
| | |||
* | Fixed compilation error with cuda >= 7.5 | 2016-05-03 | |
| | |||
* | Made a cast explicit | 2016-05-02 | |
| | |||
* | Fixed compilation errors generated by clang | 2016-04-29 | |
| | |||
* | fpclassify isn't portable enough. In particular, the return values of the ↵ | 2016-04-27 | |
| | | | | function are not available on all the platforms Eigen supportes: remove it from Eigen. | ||
* | Improved support for min and max on 16 bit floats when running on recent ↵ | 2016-04-27 | |
| | | | | cuda gpus | ||
* | Added support for fpclassify in Eigen::Numext | 2016-04-27 | |
| | |||
* | Force the inlining of the << operator on half floats | 2016-04-14 | |
| | |||
* | Inline the << operator on half floats | 2016-04-14 | |
| |