Commit message (Collapse) | Author | Age | ||
---|---|---|---|---|
... | ||||
* | | | MSVC uses __uint128 while other compilers use __uint128_t to encode 128bit ↵ | 2016-03-04 | ||
| | | | | | | | | | | | | unsigned integers. Make the cxx11_tensor_uint128.cpp test work in both cases. | |||
* | | | Fixed syntax error | 2016-03-04 | ||
| | | | ||||
* | | | Added missing include | 2016-03-04 | ||
| | | | ||||
* | | | Don't use implicit type conversions in initializer lists since not all ↵ | 2016-03-04 | ||
| | | | | | | | | | | | | compilers support them. | |||
* | | | Made the contraction test more portable | 2016-03-04 | ||
| | | | ||||
* | | | Fixed a typo | 2016-03-04 | ||
| | | | ||||
| | * | Initial implementation of igamma and igammac. | 2016-03-03 | ||
| |/ | ||||
* | | Added tests to cover the new rounding, flooring and ceiling tensor operations. | 2016-03-03 | ||
| | | ||||
* | | Added support for rounding, flooring, and ceiling to the tensor api | 2016-03-03 | ||
| | | ||||
* | | Added a test to validate the conversion of half floats into floats on Kepler ↵ | 2016-03-03 | ||
| | | | | | | | | | | | | GPUs. Restricted the testing of the random number generation code to GPU architecture greater than or equal to 3.5. | |||
* | | Enable partial support for half floats on Kepler GPUs. | 2016-03-03 | ||
| | | ||||
* | | Enable the conversion between floats and half floats on older GPUs that ↵ | 2016-03-03 | ||
| | | | | | | | | support it. | |||
* | | Merged in ebrevdo/eigen (pull request PR-167) | 2016-03-03 | ||
|\| | | | | | | | | | | | Add infinity() support to numext::numeric_limits, use it in lgamma. I tested the code on my gtx-titan-black gpu, and it appears to work as expected. | |||
| * | Small bugfix to numeric_limits for CUDA. | 2016-03-02 | ||
| | | ||||
| * | Add infinity() support to numext::numeric_limits, use it in lgamma. | 2016-03-02 | ||
| | | | | | | | | | | This makes the infinity access a __device__ function, removing nvcc warnings. | |||
* | | bug #537: fix compilation with Apples's compiler | 2016-03-02 | ||
| | | ||||
* | | Pulled latest updates from trunk | 2016-03-01 | ||
|\ \ | ||||
| * | | Compilation fix | 2016-03-01 | ||
| | | | ||||
| * | | Compilation fix | 2016-03-01 | ||
| | | | ||||
* | | | Improved the performance of large outer reductions on cuda | 2016-02-29 | ||
|/ / | ||||
* | | Added benchmarks for full reduction | 2016-02-29 | ||
| | | ||||
* | | Made the signature of the inner and outer reducers consistent | 2016-02-29 | ||
| | | ||||
* | | Optimized the performance of narrow reductions on CUDA devices | 2016-02-29 | ||
| | | ||||
* | | Fix shortcoming in fixed-value deduction of startRow/startCol | 2016-02-29 | ||
| | | ||||
* | | Print some information to stderr when a CUDA kernel fails | 2016-02-27 | ||
| | | ||||
* | | Improved the README | 2016-02-27 | ||
| | | ||||
* | | bug #1172: make valuePtr and innderIndexPtr properly return null for empty ↵ | 2016-02-27 | ||
| | | | | | | | | matrices. | |||
* | | Properly vectorized the random number generators | 2016-02-26 | ||
| | | ||||
* | | Made the TensorIndexList usable on GPU without having to use the ↵ | 2016-02-26 | ||
| | | | | | | | | -relaxed-constexpr compilation flag | |||
* | | Added benchmarks for type casting of float16 | 2016-02-26 | ||
| | | ||||
* | | Added benchmarks for fp16 | 2016-02-26 | ||
| | | ||||
* | | Reverted previous commit since it caused more problems than it solved | 2016-02-26 | ||
| | | ||||
* | | Fixed handling of long doubles on aarch64 | 2016-02-26 | ||
| | | ||||
* | | Made the CUDA architecture level a build setting. | 2016-02-25 | ||
| | | ||||
* | | Fixed a typo in the reduction code that could prevent large full reductionsx ↵ | 2016-02-24 | ||
| | | | | | | | | from running properly on old cuda devices. | |||
* | | Marked the And and Or reducers as stateless. | 2016-02-24 | ||
| | | ||||
* | | merge | 2016-02-23 | ||
|\ \ | ||||
* | | | Fix startRow()/startCol() for dense Block with direct access: | 2016-02-23 | ||
| | | | | | | | | | | | | the initial implementation failed for empty rows/columns for which are ambiguous. | |||
| * | | Updated the padding code to work with half floats | 2016-02-23 | ||
| | | | ||||
| * | | Extended the tensor benchmark suite to support types other than floats | 2016-02-23 | ||
| | | | ||||
| * | | Updated the tensor benchmarking code to work with compilers that don't ↵ | 2016-02-23 | ||
| | | | | | | | | | | | | support cxx11. | |||
| * | | Deleted the coordinate based evaluation of tensor expressions, since it's ↵ | 2016-02-22 | ||
| | | | | | | | | | | | | hardly ever used and started to cause some issues with some versions of xcode. | |||
| * | | Declare the half float type as arithmetic. | 2016-02-22 | ||
| | | | ||||
| * | | include <iostream> in the tensor header since we now use it to better report ↵ | 2016-02-22 | ||
| | | | | | | | | | | | | cuda initialization errors | |||
| * | | Fixed compilation warning generated by clang | 2016-02-21 | ||
| | | | ||||
| * | | Implemented the ptranspose function on half floats | 2016-02-21 | ||
| | | | ||||
| * | | Pulled latest updates from trunk | 2016-02-21 | ||
| |\ \ | ||||
| * | | | Added the ability to compute the absolute value of a half float | 2016-02-21 | ||
| | | | | ||||
| | * | | Added some debugging information to the test to figure out why it fails ↵ | 2016-02-21 | ||
| | | | | | | | | | | | | | | | | sometimes | |||
| | * | | Optimized casting of tensors in the case where the casting happens to be a no-op | 2016-02-21 | ||
| |/ / |