eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
...
* \|	Pulled latest updates from trunk	Benoit Steiner	2015-02-27
\|\ \
* \| \|	Added support for 32bit index on a per tensor/tensor expression. This ↵	Benoit Steiner	2015-02-27
\| \| \| \| \| \| \| \| \| \| \| \|	enables us to use 32bit indices to evaluate expressions on GPU faster while keeping the ability to use 64 bit indices to manipulate large tensors on CPU in the same binary.
* \| \|	Switch to truncated casting when converting floating point types to integer. ↵	Benoit Steiner	2015-02-27
\| \| \| \| \| \| \| \| \| \| \| \|	This ensures that vectorized casts are consistent with scalar casts
* \| \|	Added support for vectorized type casting of tensors	Benoit Steiner	2015-02-27
\| \| \|
* \| \|	Added support for fast reciprocal square root computation.	Benoit Steiner	2015-02-26
\| \| \|
\| \| *	Really use zero guess in ConjugateGradients::solve as documented	Jan Blechta	2015-02-18
\| \| \| \| \| \| \| \| \| \| \| \|	and expected for consistency with other methods.
\| \| *	merge	Gael Guennebaud	2015-03-04
\| \| \|\
\| \| * \|	Check for no-reallocation in SparseMatrix::insert (bug #974)	Gael Guennebaud	2015-03-04
\| \| \| \|
\| \| * \|	Improve efficiency of SparseMatrix::insert/coeffRef for sequential ↵	Gael Guennebaud	2015-03-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	outer-index insertion strategies (bug #974)
\| \| * \|	Update manual wrt new LSCG solver.	Gael Guennebaud	2015-03-04
\| \| \| \|
\| \| * \|	Add a CG-based solver for rectangular least-square problems (bug #975).	Gael Guennebaud	2015-03-04
\| \| \| \|
\| \| \| *	Fix asm comments in 1px1 kernel	Benoit Jacob	2015-03-03
\| \| \| \|
\| \| \| *	Add a benchmark-default-sizes action to benchmark-blocking-sizes.cpp	Benoit Jacob	2015-03-03
\| \| \| \|
\| \| \| *	New scoring functor to select the pivot.	Marc Glisse	2015-03-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is can be useful for non-floating point scalars, where choosing the biggest element is generally not the best choice.
\| \| \| *	must also disable complex<double> when disabling double vectorization	Benoit Jacob	2015-03-03
\| \| \|/
\| \| *	Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON ↵	Benoit Jacob	2015-03-03
\| \| \| \| \| \| \| \| \| \| \| \|	intrinsics.
\| \| *	HalfPacket also needed to be disabled for double, on ARMv8.	Benoit Jacob	2015-03-02
\| \| \|
\| \| *	Add SSE vectorization of Quaternion::conjugate. Significant speed-up when ↵	Gael Guennebaud	2015-03-02
\| \| \| \| \| \| \| \| \| \| \| \|	combined with products like q1*q2.conjugate()
\| \| *	Increase unit-test L1 cache size to ensure we are doing at least 2 peeled ↵	Gael Guennebaud	2015-02-27
\| \| \| \| \| \| \| \| \| \| \| \|	loop within product kernel.
\| \| *	Re-enbale detection of min/max parentheses protection, and re-enable ↵	Gael Guennebaud	2015-02-27
\| \|/ \| \| \| \| \| \|	mpreal_support unit test.
\| *	Reimplement the selection between rotating and non-rotating kernels	Benoit Jacob	2015-02-27
\| \| \| \| \| \| \| \| \| \| \| \|	using templates instead of macros and if()'s. That was needed to fix the build of unit tests on ARM, which I had broken. My bad for not testing earlier.
\| *	remove trailing comma	Benoit Jacob	2015-02-27
\| \|
\| *	Disable Packet2f/2i halfpacket support in NEON.	Benoit Jacob	2015-02-27
\| \| \| \| \| \| \| \| \| \| \| \|	I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match.
\| *	Replace a static assert by a runtime one, fixes the build of unit tests on ARM	Benoit Jacob	2015-02-27
\| \| \| \| \| \| \| \| \| \|	Also safely assert in the non-implemented path that should never be taken in practice, and would return wrong results.
\| *	Avoid packing rhs multiple-times when blocking on the lhs only.	Gael Guennebaud	2015-02-26
\| \|
\| *	Make sure that the block size computation is tested by our unit test.	Gael Guennebaud	2015-02-26
\| \|
\| *	Implement a more generic blocking-size selection algorithm. See explanations ↵	Gael Guennebaud	2015-02-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	inlines. It performs extremely well on Haswell. The main issue is to reliably and quickly find the actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3)
\| *	Fix typos in block-size testing code, and set peeling on k to 8.	Gael Guennebaud	2015-02-26
\|/
*	So I extensively measured the impact of the offset in this prefetch. I tried ↵	Benoit Jacob	2015-02-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes). On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400. I could not see any significant impact of this offset. On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0. So let's just go with 0! Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether!
*	bug #970: Add EIGEN_DEVICE_FUNC to RValue functions, in case Cuda supports ↵	Christoph Hertzberg	2015-02-24
\| \| \| \|	RValue-references.
*	Fix my recent prefetch changes:	Benoit Jacob	2015-02-23
\| \| \| \| \| \| \| \| \| \| \|	- the first prefetch is actually harmful on Haswell with FMA, but it is the most beneficial on ARM. - the second prefetch... I was very stupid and multiplied by sizeof(scalar) and offset of a scalar* pointer. The old offset was 64; pk = 8, so 64=pk*8. So this effectively restores the older offset. Actually, there were two prefetches here, one with offset 48 and one with offset 64. I could not confirm any benefit from this strange 48 offset on either the haswell or my ARM device.
*	Fix two trivial warnings	Christoph Hertzberg	2015-02-22
\|
*	log1p is defined only for real Scalars in C++11	Christoph Hertzberg	2015-02-21
\|
*	Fix compilation of unit tests disabling assertion cheking	Gael Guennebaud	2015-02-21
\|
*	Fix doc of Ref<>	Gael Guennebaud	2015-02-20
\|
*	In C++11 destructors do not throw by default (fix CommaInitializer unit test)	Gael Guennebaud	2015-02-20
\|
*	Pulled latest changes from trunk	Benoit Steiner	2015-02-19
\|\
* \|	Marked the CUDA packet primitives as EIGEN_DEVICE_FUNC since they'll end up ↵	Benoit Steiner	2015-02-19
\| \| \| \| \| \| \| \|	being executed on the GPU device.
\| *	Fix regression with C++11 support of lambda: now internal::result_of falls ↵	Gael Guennebaud	2015-02-19
\| \| \| \| \| \| \| \|	back to std::result_of in C++11.
\| *	Fix some calls to result_of on binary functors as unary ones.	Gael Guennebaud	2015-02-19
\| \|
\| *	Declare const some const variables	Gael Guennebaud	2015-02-19
\|/
*	Add support for C++11 result_of/lambdas	Gael Guennebaud	2015-02-19
\|
*	rotating kernel: avoid compiling anything outside of ARM	Benoit Jacob	2015-02-18
\|
*	remove a newly introduced redundant typedef - sorry.	Benoit Jacob	2015-02-18
\|
*	bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path	Benoit Jacob	2015-02-18
\| \| \| \| \| \| \| \|	This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).
*	Fixed template parameter.	Hauke Heibel	2015-02-18
\|
*	merge	Gael Guennebaud	2015-02-18
\|\
* \|	Clean a bit computeProductBlockingSizes (use Index type, remove CEIL macro)	Gael Guennebaud	2015-02-18
\| \|
\| *	bug #958 - Allow testing specific blocking sizes	Benoit Jacob	2015-02-18
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \|	This is only a debugging/testing patch. It allows testing specific product blocking sizes, typically to study the impact on performance. Example usage: int testk, testm, testn; #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZES #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_K testk #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_M testm #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_N testn #include <Eigen/Core>
*	Fix a regression when using OpenMP, and fix bug #714: the number of threads ↵	Gael Guennebaud	2015-02-18
\| \| \| \|	might be lower than the number of requested ones