eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Silenced clang compilation warning.	Benoit Steiner	2017-02-28
\|
*	Made the TensorStorage class compile with clang 3.9	Benoit Steiner	2017-02-28
\|
*	Fix typo.	Gael Guennebaud	2017-02-28
\|
*	bug #1380: for Map<> as input of matrix exponential	Gael Guennebaud	2017-02-20
\|
*	Silent warning.	Gael Guennebaud	2017-02-20
\|
*	Fix compilation.	Gael Guennebaud	2017-02-18
\|
*	Size indices are signed.	Benoit Steiner	2017-02-16
\|
*	Adding TensorChippingOP for sycl backend; fixing the index value in the ↵	Mehdi Goli	2017-02-13
\| \| \| \|	verification operation for cxx11_tensorChipping.cpp test
*	Pulled latest updates from upstream	Benoit Steiner	2017-02-10
\|\
* \|	Adding mean to TensorReductionSycl.h	Mehdi Goli	2017-02-07
\| \|
* \|	Fixing TensorReductionSycl for min and max.	Mehdi Goli	2017-02-06
\| \|
* \|	Reducing the warnings in Sycl backend.	Mehdi Goli	2017-02-02
\| \|
\| *	Silenced several compilation warnings	Benoit Steiner	2017-02-01
\| \|
* \|	Converting ptrdiff_t type to int64_t type in cxx11_tensor_contract_sycl.cpp ↵	Mehdi Goli	2017-02-01
\| \| \| \| \| \| \| \|	in order to be the same as other tests.
* \|	Reducing warnings in Sycl backend.	Mehdi Goli	2017-02-01
\| \|
* \|	Fixing compiler error on TensorContractionSycl.h; Silencing the compiler ↵	Mehdi Goli	2017-01-31
\| \| \| \| \| \| \| \|	unused parameter warning for eval_op_indices in TensorContraction.h
* \|	Merge latest changes from upstream	Benoit Steiner	2017-01-30
\|\\|
\| *	bug #1380: fix matrix exponential with Map<>	Gael Guennebaud	2017-01-30
\| \|
* \|	Fixing the buffer type in memcpy.	Mehdi Goli	2017-01-30
\| \|
\| *	Revert PR-292. After further investigation, the memcpy->memmove change was ↵	Rasmus Munk Larsen	2017-01-26
\| \| \| \| \| \| \| \| \| \| \| \|	only good for Haswell on older versions of glibc. Adding a switch for small sizes is perhaps useful for string copies, but also has an overhead for larger sizes, making it a poor trade-off for general memcpy. This PR also removes a couple of unnecessary semi-colons in Eigen/src/Core/AssignEvaluator.h that caused compiler warning everywhere.
\| *	Merged in ggael/eigen-flexidexing (pull request PR-294)	Gael Guennebaud	2017-01-26
\| \|\ \| \| \| \| \| \| \| \| \|	generalized operator() for indexed access and slicing
\| \| *	Fix duplicates of array_size bewteen unsupported and Core	Gael Guennebaud	2017-01-25
\| \| \|
\| * \|	Merged in rmlarsen/eigen2 (pull request PR-292)	Benoit Steiner	2017-01-25
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \|	Adds a fast memcpy function to Eigen.
\| \| * \|	Adds a fast memcpy function to Eigen. This takes advantage of the following:	Rasmus Munk Larsen	2017-01-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster. 2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation. The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}. Measured improvements in wall clock time: Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_memcpy_1T/2 3.48 2.39 +31.3% BM_memcpy_1T/8 12.3 6.51 +47.0% BM_memcpy_1T/64 371 383 -3.2% BM_memcpy_1T/512 66922 66720 +0.3% BM_memcpy_1T/4k 9892867 6849682 +30.8% BM_memcpy_1T/5k 14951099 10332856 +30.9% BM_memcpy_2T/2 3.50 2.46 +29.7% BM_memcpy_2T/8 12.3 7.66 +37.7% BM_memcpy_2T/64 371 376 -1.3% BM_memcpy_2T/512 66652 66788 -0.2% BM_memcpy_2T/4k 6145012 6117776 +0.4% BM_memcpy_2T/5k 9181478 9010942 +1.9% BM_memcpy_4T/2 3.47 2.47 +31.0% BM_memcpy_4T/8 12.3 6.67 +45.8 BM_memcpy_4T/64 374 376 -0.5% BM_memcpy_4T/512 67833 68019 -0.3% BM_memcpy_4T/4k 5057425 5188253 -2.6% BM_memcpy_4T/5k 7555638 7779468 -3.0% BM_memcpy_6T/2 3.51 2.50 +28.8% BM_memcpy_6T/8 12.3 7.61 +38.1% BM_memcpy_6T/64 373 378 -1.3% BM_memcpy_6T/512 66871 66774 +0.1% BM_memcpy_6T/4k 5112975 5233502 -2.4% BM_memcpy_6T/5k 7614180 7772246 -2.1% BM_memcpy_8T/2 3.47 2.41 +30.5% BM_memcpy_8T/8 12.4 10.5 +15.3% BM_memcpy_8T/64 372 388 -4.3% BM_memcpy_8T/512 67373 66588 +1.2% BM_memcpy_8T/4k 5148462 5254897 -2.1% BM_memcpy_8T/5k 7660989 7799058 -1.8% BM_memcpy_12T/2 3.50 2.40 +31.4% BM_memcpy_12T/8 12.4 7.55 +39.1 BM_memcpy_12T/64 374 378 -1.1% BM_memcpy_12T/512 67132 66683 +0.7% BM_memcpy_12T/4k 5185125 5292920 -2.1% BM_memcpy_12T/5k 7717284 7942684 -2.9% BM_slicingSmallPieces_1T/2 47.3 47.5 +0.4% BM_slicingSmallPieces_1T/8 53.6 52.3 +2.4% BM_slicingSmallPieces_1T/64 491 476 +3.1% BM_slicingSmallPieces_1T/512 21734 18814 +13.4% BM_slicingSmallPieces_1T/4k 394660 396760 -0.5% BM_slicingSmallPieces_1T/5k 218722 209244 +4.3% BM_slicingSmallPieces_2T/2 80.7 79.9 +1.0% BM_slicingSmallPieces_2T/8 54.2 53.1 +2.0 BM_slicingSmallPieces_2T/64 497 477 +4.0% BM_slicingSmallPieces_2T/512 21732 18822 +13.4% BM_slicingSmallPieces_2T/4k 392885 390490 +0.6% BM_slicingSmallPieces_2T/5k 221988 208678 +6.0% BM_slicingSmallPieces_4T/2 80.8 80.1 +0.9% BM_slicingSmallPieces_4T/8 54.1 53.2 +1.7% BM_slicingSmallPieces_4T/64 493 476 +3.4% BM_slicingSmallPieces_4T/512 21702 18758 +13.6% BM_slicingSmallPieces_4T/4k 393962 404023 -2.6% BM_slicingSmallPieces_4T/5k 249667 211732 +15.2% BM_slicingSmallPieces_6T/2 80.5 80.1 +0.5% BM_slicingSmallPieces_6T/8 54.4 53.4 +1.8% BM_slicingSmallPieces_6T/64 488 478 +2.0% BM_slicingSmallPieces_6T/512 21719 18841 +13.3% BM_slicingSmallPieces_6T/4k 394950 397583 -0.7% BM_slicingSmallPieces_6T/5k 223080 210148 +5.8% BM_slicingSmallPieces_8T/2 81.2 80.4 +1.0% BM_slicingSmallPieces_8T/8 58.1 53.5 +7.9% BM_slicingSmallPieces_8T/64 489 480 +1.8% BM_slicingSmallPieces_8T/512 21586 18798 +12.9% BM_slicingSmallPieces_8T/4k 394592 400165 -1.4% BM_slicingSmallPieces_8T/5k 219688 208301 +5.2% BM_slicingSmallPieces_12T/2 80.2 79.8 +0.7% BM_slicingSmallPieces_12T/8 54.4 53.4 +1.8 BM_slicingSmallPieces_12T/64 488 476 +2.5% BM_slicingSmallPieces_12T/512 21931 18831 +14.1% BM_slicingSmallPieces_12T/4k 393962 396541 -0.7% BM_slicingSmallPieces_12T/5k 218803 207965 +5.0%
\| * \| \|	Make NaN propagatation consistent between the pmax/pmin and ↵	Rasmus Munk Larsen	2017-01-24
\| \|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op. See #1373 for details.
* \| \|	Allows AMD APU	Luke Iwanski	2017-01-23
\| \| \|
* \| \|	Reverting back to the previous TensorDeviceSycl.h as the total number of ↵	Mehdi Goli	2017-01-20
\| \| \| \| \| \| \| \| \| \| \| \|	buffer is not enough for tensorflow.
* \| \|	Removing unused variables	Mehdi Goli	2017-01-19
\| \| \|
* \| \|	Merging with Benoit's upstream.	Mehdi Goli	2017-01-19
\|\ \ \
* \| \| \|	Adding non-deferrenciable pointer track for ComputeCpp backend; Adding ↵	Mehdi Goli	2017-01-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TensorConvolutionOp for ComputeCpp; fixing typos. modifying TensorDeviceSycl to use the LegacyPointer class.
\| * \| \|	Applying Benoit's comment. Embedding synchronisation inside device memcpy so ↵	Mehdi Goli	2017-01-18
\|/ / / \| \| \| \| \| \| \| \| \|	there is no need to externally call synchronise() for device memcopy.
* \| \|	Adding Tensor ReverseOp; TensorStriding; TensorConversionOp; Modifying ↵	Mehdi Goli	2017-01-16
\| \| \| \| \| \| \| \| \| \| \| \|	Tensor Contractsycl to be located in any place in the expression tree.
* \| \|	Fixes auto appearance in functor template argument for reduction.	Luke Iwanski	2017-01-04
\| \| \|
\| * \|	add cmake-option to enable/disable creation of tests	NeroBurner	2017-01-02
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* * * disable unsupportet/test when test are disabled * * * rename EIGEN_ENABLE_TESTS to BUILD_TESTING * * * consider BUILD_TESTING in blas
* \|	Reverting asynchronous exec to Synchronous exec regarding random race condition.	Mehdi Goli	2016-12-22
\| \|
\| *	Pulled latest update from trunk	Benoit Steiner	2016-12-21
\| \|\
\| * \|	Simplified the contraction code`	Benoit Steiner	2016-12-21
\| \| \|
\| \| *	Merged in benoitsteiner/opencl (pull request PR-279)	Benoit Steiner	2016-12-21
\| \|/\| \| \|/ \|/\| \| \|	Fix for auto appearing in functor template argument.
\| *	Added support for libxsmm kernel in multithreaded contractions	Benoit Steiner	2016-12-21
\| \|
\| *	Simplified the way we link libxsmm	Benoit Steiner	2016-12-21
\| \|
\| *	Leverage libxsmm kernels within signle threaded contractions	Benoit Steiner	2016-12-21
\| \|
\| *	Added support for libxsmm in the eigen makefiles	Benoit Steiner	2016-12-21
\| \|
* \|	Fix for auto appearing in functor template argument.	Luke Iwanski	2016-12-21
\|/
*	Merged eigen/eigen into default	Benoit Steiner	2016-12-20
\|\
* \|	Fixed order of initialisation in ExecExprFunctorKernel functor.	Luke Iwanski	2016-12-20
\| \|
\| *	Properly adjust precision when saving to Market format.	Gael Guennebaud	2016-12-20
\| \|
\| *	Speed up parsing of sparse Market file.	Gael Guennebaud	2016-12-20
\| \|
* \|	Matching parameters order between lambda and the functor.	Luke Iwanski	2016-12-20
\| \|
* \|	Added an OpenCL regression test	Benoit Steiner	2016-12-19
\| \|
\| *	Fixed race condition in the tensor_shuffling_sycl test	Benoit Steiner	2016-12-19
\|/