aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core/arch/AltiVec
Commit message (Collapse)AuthorAge
...
* fix compilationGravatar Gael Guennebaud2011-02-21
|
* Remove all references to EIGEN_TUNE_CPU_CACHE_SIZE.Gravatar Jitse Niesen2011-02-04
| | | | | This macro is no longer used as of revision 0212eec23f4cb64e8426bf32568156df302f8fcf .
* bug #86 : use internal:: namespace instead of ei_ prefixGravatar Benoit Jacob2010-10-25
|
* mixing types in product step 2:Gravatar Gael Guennebaud2010-07-11
| | | | | | | | * pload* and pset1 are now templated on the packet type * gemv routines are now embeded into a structure with a consistent API with respect to gemm * some configurations of vector * matrix and matrix * matrix works fine, some need more work...
* syncGravatar Gael Guennebaud2010-07-10
|\
| * Added NEON/Complex.h, ~3.5x faster than scalar std::complex<float>Gravatar Konstantinos Margaritis2010-07-10
| | | | | | | | minor fix in AltiVec Complex.h
| * forgot to commit ei_p4f_FORWARD;Gravatar Konstantinos Margaritis2010-07-09
| |
| * Altivec port of Complex.h.Gravatar Konstantinos Margaritis2010-07-09
| | | | | | | | | | | | | | | | Note: For some reason g++ 4.4 is >200% slower than g++ 4.3 on altivec code. The same benchmark (bench_gemm) was tested, on the same hardware/OS (G4/Debian testing), with same CFLAGS. With some code reorganizing I managed to get some minor gain on 4.4, but I just could not reach 4.3 speed. This is most likely a bug, but I'm waiting to see if it's fixed on 4.5. I'll look into this a bit more.
* | scalars fitting in a single packet requires more work, step 1Gravatar Gael Guennebaud2010-07-08
|/ | | | | * add a, Alignable trait * update LinearVectorization assignment
* s/IsVectorized/VectorizableGravatar Gael Guennebaud2010-07-07
|
* * add a IsVectorized mechanism (instead of packet-size>1...)Gravatar Gael Guennebaud2010-07-06
| | | | * vectorize complex<double>
* AltiVec signed integer pmadd removed, proved to be 2x slower than the scalar ↵Gravatar Konstantinos Margaritis2010-06-28
| | | | trait(!).
* Add a proof concept API to configure the blocking parameters at runtime.Gravatar Gael Guennebaud2010-06-07
| | | | After validation of the final API I'll update the other products to use it.
* (proper commit this time)Gravatar Konstantinos Margaritis2010-04-24
| | | | | | | replaced _mm_prefetch in GeneralBlockPanelKernel.h, with ei_prefetch() inline function. Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h. Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch(). NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.
* Backed out changeset 6972c140f737874d88da0e225c7c27b4563a4518Gravatar Konstantinos Margaritis2010-04-24
|
* replaced _mm_prefetch in GeneralBlockPanelKernel.h, with ei_prefetch() ↵Gravatar oem2010-04-24
| | | | | | | | inline function. Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h. Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch(). NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.
* fix copy pasted commentGravatar Gael Guennebaud2010-03-05
|
* Altivec brought up to date. Most tests pass and performance is better than ↵Gravatar Konstantinos Margaritis2010-03-05
| | | | before too!
* Added initial NEON support, most tests pass however we had to use some ↵Gravatar Konstantinos Margaritis2010-03-03
| | | | | | | | hackish workarounds as gcc on ARM (both CodeSourcery 4.4.1 used and experimental 4.5) fail to ensure proper alignment with __attribute__((aligned(16))). This has to be fixed upstream to remove the workarounds.
* we were already aligning to 16 byte boundary fixed-size objects that are ↵Gravatar Benoit Jacob2009-10-05
| | | | | | | | multiple of 16 bytes; now we also align to 8byte boundary fixed-size objects that are multiple of 8 bytes. That's only useful for now for double, not e.g. for Vector2f, but that didn't seem to hurt. Am I missing something? Do you prefer that we don't align Vector2f at all? Also, improvements in test_unalignedassert.
* remove sentence "Eigen itself is part of the KDE project."Gravatar Benoit Jacob2009-05-22
| | | | it never made very precise sense. but now does it still make any?
* add SSE2 versions of sin, cos, log, exp using code from JulienGravatar Gael Guennebaud2009-03-25
| | | | | | | | Pommier. They are for float only, and they return exactly the same result as the standard versions in about 90% of the cases. Otherwise the max error is below 1e-7. However, for very large values (>1e3) the accuracy of sin and cos slighlty decrease. They are about 3 or 4 times faster than 4 calls to their respective standard versions. So, is it ok to enable them by default in their respective functors ?
* ei_pnegate implemented for AltiVecGravatar Konstantinos A. Margaritis2009-03-20
|
* add vectorization of unary operator-() (the AltiVec version is probablyGravatar Gael Guennebaud2009-03-20
| | | | broken)
* add the vectorization of absGravatar Gael Guennebaud2009-03-09
|
* Add COMPONENT DevelGravatar Laurent Montel2009-02-23
|
* no reason for 3 vec_mins, 2 are enough apparently in ei_predux_minGravatar Konstantinos A. Margaritis2009-02-12
|
* modified ei_predux_min/max to actually use altivec instructionsGravatar Konstantinos A. Margaritis2009-02-12
|
* * exit Sum.h, exit Prod.h, welcome vectorization of redux() !Gravatar Gael Guennebaud2009-02-12
| | | | * add vectorization for minCoeff and maxCoeff
* add ei_predux_mul for AltiVecGravatar Gael Guennebaud2009-02-10
|
* fixed preserve_mask definition for AltiVec (needed __vector keyword)Gravatar Konstantinos A. Margaritis2009-02-08
|
* add bench_reverse, draft of a reverse vectorization for AltiVec, makeGravatar Gael Guennebaud2009-02-06
| | | | global Scaling function static
* Missing inline keywords in AltiVec/PacketMath were making Avogadro failGravatar Benoit Jacob2008-08-27
| | | | to compile (duplicate symbols).
* remove double ;Gravatar Benoit Jacob2008-08-27
|
* replace vector by __vector to prevent conflict with std::vectorGravatar Benoit Jacob2008-08-26
|
* * patch from Konstantinos Margaritis: bugfix in Altivec version of ei_pdivGravatar Gael Guennebaud2008-08-25
| | | | | | | and various cleaning in Altivec code. Altivec vectorization have been re-enabled in CoreDeclaration * added copy constructors in non empty functors because I observed weird behavior with std::complex<>
* patch from Konstantinos Margaritis: Altivec vectorization is resurrected !Gravatar Gael Guennebaud2008-08-22
|
* * fix bug found by Boudewijn Rempt: no CMakeLists in arch/ subdirGravatar Benoit Jacob2008-08-19
| | | | * fix warning in SolveTriangular
* * rework Map, allow vectorizationGravatar Benoit Jacob2008-06-27
| | | | | | | | * rework PacketMath and DummyPacketMath, make these actual template specializations instead of just overriding by non-template inline functions * introduce ei_ploadt and ei_pstoret, make use of them in Map and Matrix * remove Matrix::map() methods, use Map constructors instead.
* * add ei_pdiv intrinsic, make quotient functor vectorizableGravatar Benoit Jacob2008-06-23
| | | | * add vdw benchmark from Tim's real-world use case
* put inline keywords everywhere appropriate. So we don't need anymore to passGravatar Benoit Jacob2008-05-12
| | | | -finline-limit=1000 to gcc to get good performance. By the way some cleanup.
* move arch-specific code to arch/SSE and arch/AltiVec subdirs.Gravatar Benoit Jacob2008-05-12
rename the noarch PacketMath.h to DummyPacketMath.h