eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
...
* \|	Add Bessel functions to SpecialFunctions.	Srinivas Vasudevan	2019-09-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Split SpecialFunctions files in to a separate BesselFunctions file. In particular add: - Modified bessel functions of the second kind k0, k1, k0e, k1e - Bessel functions of the first kind j0, j1 - Bessel functions of the second kind y0, y1
\| *	Fix maybe-unitialized warnings in TensorContractionThreadPool	Eugene Zhulenev	2019-09-13
\| \|
\| *	Use ThreadLocal container in TensorContractionThreadPool	Eugene Zhulenev	2019-09-13
\|/
*	Add packetized versions of i0e and i1e special functions.	Srinivas Vasudevan	2019-09-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- In particular refactor the i0e and i1e code so scalar and vectorized path share code. - Move chebevl to GenericPacketMathFunctions. A brief benchmark with building Eigen with FMA, AVX and AVX2 flags Before: CPU: Intel Haswell with HyperThreading (6 cores) Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- BM_eigen_i0e_double/1 57.3 57.3 10000000 BM_eigen_i0e_double/8 398 398 1748554 BM_eigen_i0e_double/64 3184 3184 218961 BM_eigen_i0e_double/512 25579 25579 27330 BM_eigen_i0e_double/4k 205043 205042 3418 BM_eigen_i0e_double/32k 1646038 1646176 422 BM_eigen_i0e_double/256k 13180959 13182613 53 BM_eigen_i0e_double/1M 52684617 52706132 10 BM_eigen_i0e_float/1 28.4 28.4 24636711 BM_eigen_i0e_float/8 75.7 75.7 9207634 BM_eigen_i0e_float/64 512 512 1000000 BM_eigen_i0e_float/512 4194 4194 166359 BM_eigen_i0e_float/4k 32756 32761 21373 BM_eigen_i0e_float/32k 261133 261153 2678 BM_eigen_i0e_float/256k 2087938 2088231 333 BM_eigen_i0e_float/1M 8380409 8381234 84 BM_eigen_i1e_double/1 56.3 56.3 10000000 BM_eigen_i1e_double/8 397 397 1772376 BM_eigen_i1e_double/64 3114 3115 223881 BM_eigen_i1e_double/512 25358 25361 27761 BM_eigen_i1e_double/4k 203543 203593 3462 BM_eigen_i1e_double/32k 1613649 1613803 428 BM_eigen_i1e_double/256k 12910625 12910374 54 BM_eigen_i1e_double/1M 51723824 51723991 10 BM_eigen_i1e_float/1 28.3 28.3 24683049 BM_eigen_i1e_float/8 74.8 74.9 9366216 BM_eigen_i1e_float/64 505 505 1000000 BM_eigen_i1e_float/512 4068 4068 171690 BM_eigen_i1e_float/4k 31803 31806 21948 BM_eigen_i1e_float/32k 253637 253692 2763 BM_eigen_i1e_float/256k 2019711 2019918 346 BM_eigen_i1e_float/1M 8238681 8238713 86 After: CPU: Intel Haswell with HyperThreading (6 cores) Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- BM_eigen_i0e_double/1 15.8 15.8 44097476 BM_eigen_i0e_double/8 99.3 99.3 7014884 BM_eigen_i0e_double/64 777 777 886612 BM_eigen_i0e_double/512 6180 6181 100000 BM_eigen_i0e_double/4k 48136 48140 14678 BM_eigen_i0e_double/32k 385936 385943 1801 BM_eigen_i0e_double/256k 3293324 3293551 228 BM_eigen_i0e_double/1M 12423600 12424458 57 BM_eigen_i0e_float/1 16.3 16.3 43038042 BM_eigen_i0e_float/8 30.1 30.1 23456931 BM_eigen_i0e_float/64 169 169 4132875 BM_eigen_i0e_float/512 1338 1339 516860 BM_eigen_i0e_float/4k 10191 10191 68513 BM_eigen_i0e_float/32k 81338 81337 8531 BM_eigen_i0e_float/256k 651807 651984 1000 BM_eigen_i0e_float/1M 2633821 2634187 268 BM_eigen_i1e_double/1 16.2 16.2 42352499 BM_eigen_i1e_double/8 110 110 6316524 BM_eigen_i1e_double/64 822 822 851065 BM_eigen_i1e_double/512 6480 6481 100000 BM_eigen_i1e_double/4k 51843 51843 10000 BM_eigen_i1e_double/32k 414854 414852 1680 BM_eigen_i1e_double/256k 3320001 3320568 212 BM_eigen_i1e_double/1M 13442795 13442391 53 BM_eigen_i1e_float/1 17.6 17.6 41025735 BM_eigen_i1e_float/8 35.5 35.5 19597891 BM_eigen_i1e_float/64 240 240 2924237 BM_eigen_i1e_float/512 1424 1424 485953 BM_eigen_i1e_float/4k 10722 10723 65162 BM_eigen_i1e_float/32k 86286 86297 8048 BM_eigen_i1e_float/256k 691821 691868 1000 BM_eigen_i1e_float/1M 2777336 2777747 256 This shows anywhere from a 50% to 75% improvement on these operations. I've also benchmarked without any of these flags turned on, and got similar performance to before (if not better). Also tested packetmath.cpp + special_functions to ensure no regressions.
*	Fix for the HIP build+test errors introduced by the ndtri support.	Deven Desai	2019-09-06
\| \| \| \| \| \| \|	The fixes needed are * adding EIGEN_DEVICE_FUNC attribute to a couple of funcs (else HIPCC will error out when non-device funcs are called from global/device funcs) * switching to using ::<math_func> instead std::<math_func> (only for HIPCC) in cases where the std::<math_func> is not recognized as a device func by HIPCC * removing an errant "j" from a testcase (don't know how that made it in to begin with!)
*	Update ThreadLocal to use separate Initialize/Release callables	Eugene Zhulenev	2019-09-10
\|
*	ThreadLocal container that does not rely on thread local storage	Eugene Zhulenev	2019-09-09
\|
*	PR 681: Add ndtri function, the inverse of the normal distribution function.	Srinivas Vasudevan	2019-08-12
\|
*	Allow move-only done callback in TensorAsyncDevice	Eugene Zhulenev	2019-09-03
\|
*	Add test for const TensorMap underlying data mutation	Eugene Zhulenev	2019-09-03
\|
*	TensorMap constness should not change underlying storage constness	Eugene Zhulenev	2019-09-03
\|
*	Fixed Tensor documentation formatting.	Alberto Luaces	2019-07-23
\|
*	Fix shadow warnings in TensorContractionThreadPool	Eugene Zhulenev	2019-08-30
\|
*	Fix block mapper type name in TensorExecutor	Eugene Zhulenev	2019-08-30
\|
*	evalSubExprsIfNeededAsync + async TensorContractionThreadPool	Eugene Zhulenev	2019-08-30
\|
*	Revert accidentally removed <memory> header from ThreadPool	Eugene Zhulenev	2019-08-30
\|
*	Asynchronous expression evaluation with TensorAsyncDevice	Eugene Zhulenev	2019-08-30
\|
*	Const correctness in TensorMap<const Tensor<T, ...>> expressions	Eugene Zhulenev	2019-08-28
\|
*	Remove shadow warnings in TensorDeviceThreadPool	Eugene Zhulenev	2019-08-28
\|
*	Merged in ezhulenev/eigen-01 (pull request PR-683)	Rasmus Larsen	2019-08-26
\|\ \| \| \| \| \| \|	Asynchronous parallelFor in Eigen ThreadPoolDevice
* \|	Fix get_random_seed on Native Client	maratek	2019-08-23
\| \| \| \| \| \| \| \| \| \|	Newlib in Native Client SDK does not provide ::random function. Implement get_random_seed for NaCl using ::rand, similarly to Windows version.
\| *	Asynchronous parallelFor in Eigen ThreadPoolDevice	Eugene Zhulenev	2019-08-22
\|/
*	Remove XSMM support from Tensor module	Eugene Zhulenev	2019-08-19
\|
*	Disable tests for contraction with output kernels when using libxsmm, which ↵	Rasmus Munk Larsen	2019-08-07
\| \| \| \|	does not support this.
*	[Eigen] Vectorize evaluation of coefficient-wise functions over tensor ↵	Rasmus Munk Larsen	2019-08-07
\| \| \| \| \| \| \| \| \| \| \| \|	blocks if the strides are known to be 1. Provides up to 20-25% speedup of the TF cross entropy op with AVX. A few benchmark numbers: name old time/op new time/op delta BM_Xent_16_10000_cpu 448µs ± 3% 389µs ± 2% -13.21% (p=0.008 n=5+5) BM_Xent_32_10000_cpu 575µs ± 6% 454µs ± 3% -21.00% (p=0.008 n=5+5) BM_Xent_64_10000_cpu 933µs ± 4% 712µs ± 1% -23.71% (p=0.008 n=5+5)
*	Clean up unnecessary namespace specifiers in TensorBlock.h.	Rasmus Munk Larsen	2019-08-07
\|
*	Fix performance regressions due to ↵	Rasmus Munk Larsen	2019-08-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://bitbucket.org/eigen/eigen/pull-requests/662. The change caused the device struct to be copied for each expression evaluation, and caused, e.g., a 10% regression in the TensorFlow multinomial op on GPU: Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------------------------------- BM_Multinomial_gpu_1_100000_4 128173 231326 2922 1.610G items/s VS Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------------------------------- BM_Multinomial_gpu_1_100000_4 146683 246914 2719 1.509G items/s
*	Fix expression evaluation heuristic for TensorSliceOp	Eugene Zhulenev	2019-07-09
\|
*	Add outer/inner chipping optimization for chipping dimension specified at ↵	Eugene Zhulenev	2019-07-03
\| \| \| \|	runtime
*	adding the EIGEN_DEVICE_FUNC attribute to the constCast routine.	Deven Desai	2019-07-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Not having this attribute results in the following failures in the `--config=rocm` TF build. ``` In file included from tensorflow/core/kernels/cross_op_gpu.cu.cc:20: In file included from ./tensorflow/core/framework/register_types.h:20: In file included from ./tensorflow/core/framework/numeric_types.h:20: In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1: In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:140: external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error: 'Eigen::constCast': no overloaded function has restriction specifiers that are compatible with the ambient context 'data' typename Storage::Type result = constCast(m_impl.data()); ^ external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error: 'Eigen::constCast': no overloaded function has restriction specifiers that are compatible with the ambient context 'data' external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h:148:56: note: in instantiation of member function 'Eigen::TensorEvaluator<const Eigen::TensorChippingOp<1, Eigen::TensorMap<Eigen::Tensor<int, 2, 1, long>, 16, MakePointer> >, Eigen::Gpu\ Device>::data' requested here return m_rightImpl.evalSubExprsIfNeeded(m_leftImpl.data()); ``` Adding the EIGEN_DEVICE_FUNC attribute resolves those errors
*	Merged in codeplaysoftware/eigen (pull request PR-667)	Gael Guennebaud	2019-07-02
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	[SYCL] : Approved-by: Gael Guennebaud <g.gael@free.fr> Approved-by: Rasmus Larsen <rmlarsen@google.com>
* \|	Allocate non-const scalar buffer for block evaluation with DefaultDevice	Eugene Zhulenev	2019-07-01
\| \|
\| *	[SYCL] :	Mehdi Goli	2019-07-01
\|/ \| \| \| \| \| \|	* Modifying TensorDeviceSYCL to use `EIGEN_THROW_X`. * Modifying TensorMacro to use `EIGEN_TRY/CATCH(X)` macro. * Modifying TensorReverse.h to use `EIGEN_DEVICE_REF` instead of `&`. * Fixing the SYCL device macro in SpecialFunctionsImpl.h.
*	Fix TensorReverse on GPU with m_stride[i]==0	Eugene Zhulenev	2019-06-28
\|
*	Fix preprocessor condition to only generate a warning when calling ↵	Rasmus Munk Larsen	2019-06-28
\| \| \| \|	eigen::GpuDevice::synchronize() from device code, but not when calling from a non-GPU compilation unit.
*	Remove comma causing warning in c++03 mode.	Rasmus Munk Larsen	2019-06-28
\|
*	Merge with Eigen head	Eugene Zhulenev	2019-06-28
\|\
* \|	Add block access to TensorReverseOp and make sure that TensorForcedEval uses ↵	Eugene Zhulenev	2019-06-28
\| \| \| \| \| \| \| \|	block access when preferred
\| *	[SYCL] This PR adds the minimum modifications to the Eigen unsupported ↵	Mehdi Goli	2019-06-28
\|/ \| \| \| \| \| \| \| \| \|	module required to run it on devices supporting SYCL. * Abstracting the pointer type so that both SYCL memory and pointer can be captured. * Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class. * Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node. * Adding SYCL macro for controlling loop unrolling. * Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
*	Remove extra comma (causes warnings in C++03)	Christoph Hertzberg	2019-06-26
\|
*	Optimize evaluation strategy for TensorSlicingOp and TensorChippingOp	Eugene Zhulenev	2019-06-25
\|
*	Merged in Artem-B/eigen (pull request PR-654)	Rasmus Larsen	2019-05-31
\|\ \| \| \| \| \| \| \| \| \| \|	Minor build improvements Approved-by: Rasmus Larsen <rmlarsen@google.com>
* \|	Clean up CUDA/NVCC version macros and their use in Eigen, and a few other ↵	Rasmus Munk Larsen	2019-05-31
\| \| \| \| \| \| \| \|	CUDA build failures.
\| *	Minor build improvements	tra	2019-05-31
\|/ \| \| \| \| \| \| \|	* Allow specifying multiple GPU architectures. E.g.: cmake -DEIGEN_CUDA_COMPUTE_ARCH="60;70" * Pass CUDA SDK path to clang. Without it it will default to /usr/local/cuda which may not be the right location, if cmake was invoked with -DCUDA_TOOLKIT_ROOT_DIR=/some/other/CUDA/path
*	Use pade for matrix exponential also for complex values.	Michael Tesch	2019-05-08
\|
*	Merged in rmlarsen/eigen (pull request PR-643)	Rasmus Larsen	2019-05-20
\|\ \| \| \| \| \| \| \| \| \| \|	Make Eigen build with cuda 10 and clang. Approved-by: Justin Lebar <justin.lebar@gmail.com>
* \|	Prevent potential division by zero in TensorExecutor	Eugene Zhulenev	2019-05-17
\| \|
* \|	Always evaluate Tensor expressions with broadcasting via tiled evaluation ↵	Eugene Zhulenev	2019-05-16
\| \| \| \| \| \| \| \|	code path
\| *	Make Eigen build with cuda 10 and clang.	Rasmus Munk Larsen	2019-05-15
\|/
*	Merged in rmlarsen/eigen_threadpool (pull request PR-640)	Rasmus Larsen	2019-05-13
\|\ \| \| \| \| \| \| \| \| \| \|	Fix deadlocks in thread pool. Approved-by: Eugene Zhulenev <ezhulenev@google.com>