|
* init impl done
* improve doc, shape inference
* add and pass shape inference
* check bound
* small fix
* add all types of reduction
* add cpu kernels
* write python tests
* make it able to build independent on GPU
* fix problems and pass all tests
* add to py
* remove redundant code
* improve doc
* change ops signature
* add Cuda{2D,3D}LaunchConfig that max occupancy
* support axis
* remove default val, check input<=0
* modify tests to net api
* fix some compilation err
* fix Const in flat_inner_outer_dims
* pass build
* fix shape test
* fix names in macros
* fix test
* specify reduceop by macro
* clean code
* further simplify code
* misc fixes
* add max size check
* fix typo
* fix typo
* tests, docs, and related changes
* build the test
* buildify
* partially support vec indices
* vec indices finish, not tested
* pass cpu tests
* fix gpu functor
* fix code style
* update doc
* fix code style
* fix code style
* sync cuda_kernel_config change
* buildify
* typename->class in template template
typename in template template (See N4051: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4051.html)
is supported in gcc starting at 5(See: https://gcc.gnu.org/projects/cxx-status.html). So using typename instead
of class will cause the compilation fails at gcc-4.
* move to contrib
* some fixes
* some fixes
* some fixes
* buildify
* build cleanup
* add cmake
|