| Commit message (Collapse) | Author | Age |
... | |
| |
|
|
|
|
|
| |
This macro is no longer used as of revision 0212eec23f4cb64e8426bf32568156df302f8fcf
.
|
| |
|
|
|
|
|
|
|
|
| |
* pload* and pset1 are now templated on the packet type
* gemv routines are now embeded into a structure with
a consistent API with respect to gemm
* some configurations of vector * matrix and matrix * matrix works fine,
some need more work...
|
|\ |
|
| |
| |
| |
| | |
minor fix in AltiVec Complex.h
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| | |
Note: For some reason g++ 4.4 is >200% slower than g++ 4.3 on altivec code.
The same benchmark (bench_gemm) was tested, on the same hardware/OS (G4/Debian testing),
with same CFLAGS. With some code reorganizing I managed to get some minor gain
on 4.4, but I just could not reach 4.3 speed. This is most likely a bug, but I'm waiting
to see if it's fixed on 4.5. I'll look into this a bit more.
|
|/
|
|
|
| |
* add a, Alignable trait
* update LinearVectorization assignment
|
| |
|
|
|
|
| |
* vectorize complex<double>
|
|
|
|
| |
trait(!).
|
|
|
|
| |
After validation of the final API I'll update the other products to use it.
|
|
|
|
|
|
|
| |
replaced _mm_prefetch in GeneralBlockPanelKernel.h, with ei_prefetch() inline function.
Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h.
Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch().
NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.
|
| |
|
|
|
|
|
|
|
|
| |
inline function.
Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h.
Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch().
NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.
|
| |
|
|
|
|
| |
before too!
|
|
|
|
|
|
|
|
| |
hackish workarounds
as gcc on ARM (both CodeSourcery 4.4.1 used and experimental 4.5) fail to
ensure proper alignment with __attribute__((aligned(16))). This has to be
fixed upstream to remove the workarounds.
|
|
|
|
|
|
|
|
| |
multiple of 16 bytes;
now we also align to 8byte boundary fixed-size objects that are multiple of 8 bytes.
That's only useful for now for double, not e.g. for Vector2f, but that didn't seem to hurt. Am I missing something? Do you prefer that we don't align Vector2f at all?
Also, improvements in test_unalignedassert.
|
|
|
|
| |
it never made very precise sense. but now does it still make any?
|
|
|
|
|
|
|
|
| |
Pommier. They are for float only, and they return exactly the same
result as the standard versions in about 90% of the cases. Otherwise the max error
is below 1e-7. However, for very large values (>1e3) the accuracy of sin and cos
slighlty decrease. They are about 3 or 4 times faster than 4 calls to their respective
standard versions. So, is it ok to enable them by default in their respective functors ?
|
| |
|
|
|
|
| |
broken)
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
* add vectorization for minCoeff and maxCoeff
|
| |
|
| |
|
|
|
|
| |
global Scaling function static
|
|
|
|
| |
to compile (duplicate symbols).
|
| |
|
| |
|
|
|
|
|
|
|
| |
and various cleaning in Altivec code. Altivec vectorization have been re-enabled
in CoreDeclaration
* added copy constructors in non empty functors because I observed weird behavior with
std::complex<>
|
| |
|
|
|
|
| |
* fix warning in SolveTriangular
|
|
|
|
|
|
|
|
| |
* rework PacketMath and DummyPacketMath, make these actual template
specializations instead of just overriding by non-template inline
functions
* introduce ei_ploadt and ei_pstoret, make use of them in Map and Matrix
* remove Matrix::map() methods, use Map constructors instead.
|
|
|
|
| |
* add vdw benchmark from Tim's real-world use case
|
|
|
|
| |
-finline-limit=1000 to gcc to get good performance. By the way some cleanup.
|
|
rename the noarch PacketMath.h to DummyPacketMath.h
|