eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Fix computeProductBlockingSizes with m==0, and add respective unit test.	Gael Guennebaud	2015-03-31
\|
*	merge	Gael Guennebaud	2015-03-27
\|\
* \|	Fix hypot(0,0).	Gael Guennebaud	2015-03-27
\| \|
\| *	Fixed the CUDA packet primitives	Benoit Steiner	2015-03-24
\| \|
* \|	Make MatrixBase::is* methods aware of nested_eval.	Gael Guennebaud	2015-03-24
\|/
*	Fix MSVC compilation: aligned type must be passed by reference	Gael Guennebaud	2015-03-19
\|
*	Fix comparison warning	Gael Guennebaud	2015-03-19
\|
*	Improve random number generation for integer and add unit test	Gael Guennebaud	2015-03-19
\|
*	use unsigned short instead of uint16_t which doesn't exist in c++98	Benoit Jacob	2015-03-17
\|
*	Similar to cset 3589a9c115a892ea3ca5dac74d71a1526764cb38	Benoit Jacob	2015-03-16
\| \| \| \|	, also in 2px4 kernel: actual_panel_rows computation should always be resilient to parameters not consistent with the known L1 cache size, see comment
*	fix bug in maxsize calculation, which would cause products of size > 2048 to ↵	Benoit Jacob	2015-03-16
\| \| \| \|	address the lookup table out of bounds
*	Update Nexus 5 lookup table from combining now 2 runs of the benchmark, ↵	Benoit Jacob	2015-03-16
\| \| \| \|	using the analyze-blocking-sizes partition tool. Gives better worst-case performance.
*	fix compilation with GCC 4.8	Benoit Jacob	2015-03-16
\|
*	Fix bug in case where EIGEN_TEST_SPECIFIC_BLOCKING_SIZE is defined but false	Benoit Jacob	2015-03-15
\|
*	Provide a empirical lookup table for blocking sizes measured on a Nexus 5. ↵	Benoit Jacob	2015-03-15
\| \| \| \|	Only for float, only for Android on ARM 32bit for now.
*	actual_panel_rows computation should always be resilient to parameters not ↵	Benoit Jacob	2015-03-15
\| \| \| \|	consistent with the known L1 cache size, see comment
*	Fix a unused-var warning	Benoit Jacob	2015-03-15
\|
*	Refactor computeProductBlockingSizes to make room for the possibility of ↵	Benoit Jacob	2015-03-15
\| \| \| \|	using lookup tables
*	organize a little our default cache sizes, and use a saner default L1 ↵	Benoit Jacob	2015-03-13
\| \| \| \|	outside of x86 (10% faster on Nexus 5)
*	bug #973, improve AVX support by enabling vectorization of Vector4i-like ↵	Gael Guennebaud	2015-03-13
\| \| \| \|	types, and enforcing alignement of Vector4f/Vector2d-like types to preserve compatibility with SSE and future Eigen versions that will vectorize them with AVX enabled.
*	Fix internal::random(x,y) for integer types. The previous implementation ↵	Gael Guennebaud	2015-03-13
\| \| \| \|	could return y+1. The new implementation uses rejection sampling to get an unbiased behabior.
*	Avoid undeflow when blocking size are tuned manually.	Gael Guennebaud	2015-03-06
\|
*	bug #969: workaround abiguous calls to Ref using enable_if.	Gael Guennebaud	2015-03-06
\|
*	bug #978: early return for vanishing products	Gael Guennebaud	2015-03-06
\|
*	Improve blocking heuristic: if the lhs fit within L1, then block on the rhs ↵	Gael Guennebaud	2015-03-06
\| \| \| \|	in L1 (allows to keep packed rhs in L1)
*	Improve product kernel: replace the previous dynamic loop swaping strategy ↵	Gael Guennebaud	2015-03-06
\| \| \| \| \| \|	by a more general one: It consists in increasing the actual number of rows of lhs's micro horizontal panel for small depth such that L1 cache is fully exploited.
*	Product optimization: implement a dynamic loop-swapping startegy to improve ↵	Gael Guennebaud	2015-03-05
\| \| \| \|	memory accesses to the destination matrix in the case of K-rank-update like products, i.e., for products of the kind: "large x small" * "small x large"
*	Fix asm comments in 1px1 kernel	Benoit Jacob	2015-03-03
\|
*	Add a benchmark-default-sizes action to benchmark-blocking-sizes.cpp	Benoit Jacob	2015-03-03
\|
*	New scoring functor to select the pivot.	Marc Glisse	2015-03-03
\| \| \| \|	This is can be useful for non-floating point scalars, where choosing the biggest element is generally not the best choice.
*	must also disable complex<double> when disabling double vectorization	Benoit Jacob	2015-03-03
\|
*	Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON ↵	Benoit Jacob	2015-03-03
\| \| \| \|	intrinsics.
*	HalfPacket also needed to be disabled for double, on ARMv8.	Benoit Jacob	2015-03-02
\|
*	Increase unit-test L1 cache size to ensure we are doing at least 2 peeled ↵	Gael Guennebaud	2015-02-27
\| \| \| \|	loop within product kernel.
*	Re-enbale detection of min/max parentheses protection, and re-enable ↵	Gael Guennebaud	2015-02-27
\| \| \| \|	mpreal_support unit test.
*	Reimplement the selection between rotating and non-rotating kernels	Benoit Jacob	2015-02-27
\| \| \| \| \| \|	using templates instead of macros and if()'s. That was needed to fix the build of unit tests on ARM, which I had broken. My bad for not testing earlier.
*	remove trailing comma	Benoit Jacob	2015-02-27
\|
*	Disable Packet2f/2i halfpacket support in NEON.	Benoit Jacob	2015-02-27
\| \| \| \| \| \|	I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match.
*	Replace a static assert by a runtime one, fixes the build of unit tests on ARM	Benoit Jacob	2015-02-27
\| \| \| \| \|	Also safely assert in the non-implemented path that should never be taken in practice, and would return wrong results.
*	Avoid packing rhs multiple-times when blocking on the lhs only.	Gael Guennebaud	2015-02-26
\|
*	Make sure that the block size computation is tested by our unit test.	Gael Guennebaud	2015-02-26
\|
*	Implement a more generic blocking-size selection algorithm. See explanations ↵	Gael Guennebaud	2015-02-26
\| \| \| \| \| \| \|	inlines. It performs extremely well on Haswell. The main issue is to reliably and quickly find the actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3)
*	Fix typos in block-size testing code, and set peeling on k to 8.	Gael Guennebaud	2015-02-26
\|
*	So I extensively measured the impact of the offset in this prefetch. I tried ↵	Benoit Jacob	2015-02-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes). On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400. I could not see any significant impact of this offset. On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0. So let's just go with 0! Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether!
*	bug #970: Add EIGEN_DEVICE_FUNC to RValue functions, in case Cuda supports ↵	Christoph Hertzberg	2015-02-24
\| \| \| \|	RValue-references.
*	Fix my recent prefetch changes:	Benoit Jacob	2015-02-23
\| \| \| \| \| \| \| \| \| \| \|	- the first prefetch is actually harmful on Haswell with FMA, but it is the most beneficial on ARM. - the second prefetch... I was very stupid and multiplied by sizeof(scalar) and offset of a scalar* pointer. The old offset was 64; pk = 8, so 64=pk*8. So this effectively restores the older offset. Actually, there were two prefetches here, one with offset 48 and one with offset 64. I could not confirm any benefit from this strange 48 offset on either the haswell or my ARM device.
*	log1p is defined only for real Scalars in C++11	Christoph Hertzberg	2015-02-21
\|
*	Fix compilation of unit tests disabling assertion cheking	Gael Guennebaud	2015-02-21
\|
*	Fix doc of Ref<>	Gael Guennebaud	2015-02-20
\|
*	In C++11 destructors do not throw by default (fix CommaInitializer unit test)	Gael Guennebaud	2015-02-20
\|