aboutsummaryrefslogtreecommitdiffhomepage
path: root/unsupported/Eigen/CXX11
Commit message (Collapse)AuthorAge
* Made the TensorIndexList usable on GPU without having to use the ↵Gravatar Benoit Steiner2016-02-26
| | | | -relaxed-constexpr compilation flag
* Fixed a typo in the reduction code that could prevent large full reductionsx ↵Gravatar Benoit Steiner2016-02-24
| | | | from running properly on old cuda devices.
* Marked the And and Or reducers as stateless.Gravatar Benoit Steiner2016-02-24
|
* Updated the padding code to work with half floatsGravatar Benoit Steiner2016-02-23
|
* Deleted the coordinate based evaluation of tensor expressions, since it's ↵Gravatar Benoit Steiner2016-02-22
| | | | hardly ever used and started to cause some issues with some versions of xcode.
* include <iostream> in the tensor header since we now use it to better report ↵Gravatar Benoit Steiner2016-02-22
| | | | cuda initialization errors
* Fixed compilation warning generated by clangGravatar Benoit Steiner2016-02-21
|
* Optimized casting of tensors in the case where the casting happens to be a no-opGravatar Benoit Steiner2016-02-21
|
* Prevent unecessary Index to int conversionsGravatar Benoit Steiner2016-02-21
|
* Get rid of duplicate code.Gravatar Rasmus Munk Larsen2016-02-19
|
* Speed up tensor FFT by up ~25-50%.Gravatar Rasmus Munk Larsen2016-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_tensor_fft_single_1D_cpu/8 132 134 -1.5% BM_tensor_fft_single_1D_cpu/9 1162 1229 -5.8% BM_tensor_fft_single_1D_cpu/16 199 195 +2.0% BM_tensor_fft_single_1D_cpu/17 2587 2267 +12.4% BM_tensor_fft_single_1D_cpu/32 373 341 +8.6% BM_tensor_fft_single_1D_cpu/33 5922 4879 +17.6% BM_tensor_fft_single_1D_cpu/64 797 675 +15.3% BM_tensor_fft_single_1D_cpu/65 13580 10481 +22.8% BM_tensor_fft_single_1D_cpu/128 1753 1375 +21.6% BM_tensor_fft_single_1D_cpu/129 31426 22789 +27.5% BM_tensor_fft_single_1D_cpu/256 4005 3008 +24.9% BM_tensor_fft_single_1D_cpu/257 70910 49549 +30.1% BM_tensor_fft_single_1D_cpu/512 8989 6524 +27.4% BM_tensor_fft_single_1D_cpu/513 165402 107751 +34.9% BM_tensor_fft_single_1D_cpu/999 198293 115909 +41.5% BM_tensor_fft_single_1D_cpu/1ki 21289 14143 +33.6% BM_tensor_fft_single_1D_cpu/1k 361980 233355 +35.5% BM_tensor_fft_double_1D_cpu/8 138 131 +5.1% BM_tensor_fft_double_1D_cpu/9 1253 1133 +9.6% BM_tensor_fft_double_1D_cpu/16 218 200 +8.3% BM_tensor_fft_double_1D_cpu/17 2770 2392 +13.6% BM_tensor_fft_double_1D_cpu/32 406 368 +9.4% BM_tensor_fft_double_1D_cpu/33 6418 5153 +19.7% BM_tensor_fft_double_1D_cpu/64 856 728 +15.0% BM_tensor_fft_double_1D_cpu/65 14666 11148 +24.0% BM_tensor_fft_double_1D_cpu/128 1913 1502 +21.5% BM_tensor_fft_double_1D_cpu/129 36414 24072 +33.9% BM_tensor_fft_double_1D_cpu/256 4226 3216 +23.9% BM_tensor_fft_double_1D_cpu/257 86638 52059 +39.9% BM_tensor_fft_double_1D_cpu/512 9397 6939 +26.2% BM_tensor_fft_double_1D_cpu/513 203208 114090 +43.9% BM_tensor_fft_double_1D_cpu/999 237841 125583 +47.2% BM_tensor_fft_double_1D_cpu/1ki 20921 15392 +26.4% BM_tensor_fft_double_1D_cpu/1k 455183 250763 +44.9% BM_tensor_fft_single_2D_cpu/8 1051 1005 +4.4% BM_tensor_fft_single_2D_cpu/9 16784 14837 +11.6% BM_tensor_fft_single_2D_cpu/16 4074 3772 +7.4% BM_tensor_fft_single_2D_cpu/17 75802 63884 +15.7% BM_tensor_fft_single_2D_cpu/32 20580 16931 +17.7% BM_tensor_fft_single_2D_cpu/33 345798 278579 +19.4% BM_tensor_fft_single_2D_cpu/64 97548 81237 +16.7% BM_tensor_fft_single_2D_cpu/65 1592701 1227048 +23.0% BM_tensor_fft_single_2D_cpu/128 472318 384303 +18.6% BM_tensor_fft_single_2D_cpu/129 7038351 5445308 +22.6% BM_tensor_fft_single_2D_cpu/256 2309474 1850969 +19.9% BM_tensor_fft_single_2D_cpu/257 31849182 23797538 +25.3% BM_tensor_fft_single_2D_cpu/512 10395194 8077499 +22.3% BM_tensor_fft_single_2D_cpu/513 144053843 104242541 +27.6% BM_tensor_fft_single_2D_cpu/999 279885833 208389718 +25.5% BM_tensor_fft_single_2D_cpu/1ki 45967677 36070985 +21.5% BM_tensor_fft_single_2D_cpu/1k 619727095 456489500 +26.3% BM_tensor_fft_double_2D_cpu/8 1110 1016 +8.5% BM_tensor_fft_double_2D_cpu/9 17957 15768 +12.2% BM_tensor_fft_double_2D_cpu/16 4558 4000 +12.2% BM_tensor_fft_double_2D_cpu/17 79237 66901 +15.6% BM_tensor_fft_double_2D_cpu/32 21494 17699 +17.7% BM_tensor_fft_double_2D_cpu/33 357962 290357 +18.9% BM_tensor_fft_double_2D_cpu/64 105179 87435 +16.9% BM_tensor_fft_double_2D_cpu/65 1617143 1288006 +20.4% BM_tensor_fft_double_2D_cpu/128 512848 419397 +18.2% BM_tensor_fft_double_2D_cpu/129 7271322 5636884 +22.5% BM_tensor_fft_double_2D_cpu/256 2415529 1922032 +20.4% BM_tensor_fft_double_2D_cpu/257 32517952 24462177 +24.8% BM_tensor_fft_double_2D_cpu/512 10724898 8287617 +22.7% BM_tensor_fft_double_2D_cpu/513 146007419 108603266 +25.6% BM_tensor_fft_double_2D_cpu/999 296351330 221885776 +25.1% BM_tensor_fft_double_2D_cpu/1ki 59334166 48357539 +18.5% BM_tensor_fft_double_2D_cpu/1k 666660132 483840349 +27.4%
* Print an error message to stderr when the initialization of the CUDA runtime ↵Gravatar Benoit Steiner2016-02-19
| | | | fails. This helps debugging setup issues.
* Updated the contraction code to make it compatible with half floats.Gravatar Benoit Steiner2016-02-19
|
* Added support for tensor reductions on half floatsGravatar Benoit Steiner2016-02-19
|
* Added the ability to query the minor version of a cuda deviceGravatar Benoit Steiner2016-02-19
|
* Don't make the array constructors explicitGravatar Benoit Steiner2016-02-19
|
* Fixed a bug in the tensor type converterGravatar Benoit Steiner2016-02-19
|
* Added a method to conjugate the content of a tensor or the result of a ↵Gravatar Benoit Steiner2016-02-11
| | | | tensor expression.
* Worked around a few clang compilation warningsGravatar Benoit Steiner2016-02-10
|
* Fixed clang comilation warningsGravatar Benoit Steiner2016-02-10
|
* Fixed some clang compilation warningsGravatar Benoit Steiner2016-02-09
|
* Updated the TensorIntDivisor code to work properly on LLP64 systemsGravatar Benoit Steiner2016-02-08
|
* Avoid unecessary type conversionsGravatar Benoit Steiner2016-02-05
|
* Added support for vectorized type casting of int to char.Gravatar Benoit Steiner2016-02-03
|
* Fixed the initialization of the dummy member of the array class to make it ↵Gravatar Benoit Steiner2016-02-03
| | | | compatible with pairs of element.
* Made sure the dummy element of size 0 array is always intialized to silence ↵Gravatar Benoit Steiner2016-02-03
| | | | some compiler warnings
* Merged in ville-k/eigen/explicit_long_constructors (pull request PR-158)Gravatar Benoit Steiner2016-02-02
|\ | | | | | | Add constructor for long types.
| * Use EIGEN_STATIC_ASSERT for backward compatibility.Gravatar Ville Kallioniemi2016-02-02
| |
* | Don't try to use direct offsets when computing a tensor product, since the ↵Gravatar Benoit Steiner2016-02-02
| | | | | | | | required stride isn't available.
| * Replace separate low word constructors with a single templated constructor.Gravatar Ville Kallioniemi2016-02-01
| |
| * Rebase to latest.Gravatar Ville Kallioniemi2016-02-01
| |\ | |/ |/|
* | Made it possible to limit the number of blocks that will be used to evaluate ↵Gravatar Benoit Steiner2016-02-01
| | | | | | | | a tensor expression on a CUDA device. This makesit possible to set aside streaming multiprocessors for other computations.
* | Fixed a number of compilation warnings generated by the cuda testsGravatar Benoit Steiner2016-01-31
| |
* | Fixed a few compilation warningsGravatar Benoit Steiner2016-01-31
| |
* | Marked several methods EIGEN_DEVICE_FUNCGravatar Benoit Steiner2016-01-28
| |
* | Fixed a couple of compilation warnings.Gravatar Benoit Steiner2016-01-28
| |
* | mergeGravatar Gael Guennebaud2016-01-28
|\ \
* | | Deleted an invalid assertion that prevented the assignment of empty tensors.Gravatar Benoit Steiner2016-01-27
| | |
* | | Fixed some compilation problems with nvcc + clangGravatar Benoit Steiner2016-01-27
| | |
| | * Add constructor for long types.Gravatar Ville Kallioniemi2016-01-26
| | |
* | | Don't explicitely evaluate the subexpression from ↵Gravatar Benoit Steiner2016-01-24
| | | | | | | | | | | | TensorForcedEval::evalSubExprIfNeeded, as it will be done when executing the EvalTo subexpression
* | | Added missing EIGEN_DEVICE_FUNC qualifierGravatar Benoit Steiner2016-01-24
| | |
* | | Merged in ville-k/eigen/tensorflow_fix (pull request PR-153)Gravatar Benoit Steiner2016-01-22
|\ \ \ | | | | | | | | | | | | Add ctor for long
* | | | Leverage the new blocking code in the tensor contraction code.Gravatar Benoit Steiner2016-01-22
| |_|/ |/| |
* | | Created a mechanism to enable contraction mappers to determine the best ↵Gravatar Benoit Steiner2016-01-22
| | | | | | | | | | | | blocking strategy.
* | | Backout changeset 690bc950f70c61075d396671e63480bbd64bb297Gravatar Gael Guennebaud2016-01-22
| | |
| * | Update to latest default branchGravatar Ville Kallioniemi2016-01-21
| |\ \ | |/ / |/| |
* | | Fixed a constness bugGravatar Benoit Steiner2016-01-21
| | |
* | | fix clang warningsGravatar Jan Prach2016-01-20
| | | | | | | | | | | | "braces around scalar initializer"
* | | Small cleanup and small fix to the contraction of row major tensorsGravatar Benoit Steiner2016-01-20
| | |