1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
|
/*
* $Id: notes.dox,v 1.20 2004/04/11 01:29:41 opetzold Exp $
*/
/**
\page notes Some Notes ...
<p>Contents:</p>
-# \ref optimizing
-# \ref temporaries
-# \ref operators
-# \ref threads
-# \ref expressions
-# \ref adl
-# \ref alias
-# \ref spec_meta_func
-# \ref mmv
\section optimizing ... on optimizing
This depends heavily on compiler and the flags used. The code produced with
-O could be better than with -O2 even on gcc-2.9x suite. To get the best
results, you should examine the assembler code generated by your compiler.
Maybe I will write a benchmark suite for different compiler options one day.
(Maybe you could contribute?)
\section temporaries ... on temporaries
The use of expression templates (ET) and meta templates (MT) allows the
generated code to avoid the creation of many temporaries -- especially with
standard mathematical and assignment operations. There are times that you
have to use actual temporaries e.g. when swapping variables of type
double -- with integer values you can use the XOR operator.
Some implementations are using a loop with temporaries even if there is a
solution with ET. Than the loops are faster than MT.
\sa \ref mmv
\section operators ... on operators and namespace element_wise
Some operations on matrices and vectors are not available at first glance.
These are defined in the namespace <code>element_wise</code> because they
are element wise (and not strictly mathematical) operations.
But there is more: some functions do element wise operations per se (e.g.
vector addition) and are NOT inside namespace element_wise. Furthermore,
all comparison operators perform element wise operations.
\sa \ref compare
\section threads ... about Threads
This library is not thread safe. It's designed for small math operations where
the overhead for locking policies is too great. If you require locking for
a multi-threaded application, you will need to write a wrapper.
\section expressions ... on expressions
The first versions of %tvmet had only one expression (Xpr) which was shared
for both vectors and matrices. This was working fine, but limited tvmet's
use for arithmetic expressions expressions on complex values. For this
reason, I had to separate expression types for vectors and matrices. The
same problem appeared when using the eval() function for evaluating these
expressions. (Which index operator should handle it?) Unfortunately, the
use of separate expression types vastly increases the number of operators
and functions needed to make the library viable. Fortunately, most boundary
checks are not necessary since they are done at compile time (such as those
needed by the assignment operator, etc).
\section adl ... on namespaces and Koenig Lookup
IMO, the cleanest way would be to put all functions into their own
namespace <code>Functional</code> instead of naming them with the
<code>fncl_</code> prefix they currently have. (I did beforehand, and
have thought better since). Unfortunately, this technique doesn't work well.
I got compiler errors like:
\code
template <class T> Functional::xyt<T>' is not a function
conflict with `template <class E> xyz(Xpr<E>)' in call to `xyz'
\endcode
when trying:
\code
typedef Vector<double, 3> vector3d;
vector3d t1(1,2,3);
vector3d t2(t1);
vector3d r;
r = sqrt( t1 * t2 );
\endcode
ADL (argument dependent lookup), aka Koenig Lookup, is causing the
compiler to check for a match in namespace Functional, since the
template instantiation is part of Functional (the Xpr stuff), it matches
before the global namespace (the Vector stuff) is checked. Writing:
\code
r = ::sqrt( t1 * t2 );
\endcode
seems to solve the problem at first glance. However, to force the user of
the library into this syntax is painful and could probably run cause other
problems with other namespaces (I haven't checked this). Therefore, all
"Functionals" have the prefix fncl_.
\section alias ... about aliasing
tvmet assumes that all matrices and vectors are alias free. These means that
source and destination memory layout of matrices and vectors never overlaps
(during an operation).
This is very easy to understood if you see a matrix-vector product. Both
contain different data in different (unique, non-overlapping) memory
regions -- hence, they are alias free. Contrast this with a matrix-matrix
multiply which maybe can have an aliasing, e.g. \f$A = A * B\f$.
When source and destination memory regions are the same, the computed results
may be wrong. (Probably they will be.) But, \f$C = A * B\f$ is alias free.
Let's see an example in detail:
\par Example:
\code
Matrix<double,3,3> M1;
M1 = 1,2,3,4,5,6,7,8,9;
cout << "M1 = " << M1 << endl;
M1 = trans(M1);
cout << "M1 = " << M1 << endl;
\endcode
\par Output:
\code
M1 = Matrix<d, 3, 3> = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
M1 = Matrix<d, 3, 3> = [
[1, 4, 7],
[4, 5, 8],
[7, 8, 9]
]
\endcode
As you can see, the lower triangular matrix isn't what you expected due to
the aliasing. These results depends on the compiler optimizations, too.
Unfortunately, to avoid the aliasing problem, you must use temporaries
as shown here:
\par Example:
\code
matrix_type temp_A(A);
A = temp_A * B;
cout << "matrix_type temp_A(A);\n"
<< "A = temp_A * B = " << A << endl;
\endcode
Anyway, it seems there is a small exception (no guarantee, since it's
compiler dependent I assume) for element wise operations with matrices
or vectors on right hand side.
Starting with tvmet release 1.4.1 there is a new function alias. These
function use a proxy to call special member functions of the %Matrix/Vector
class. These member functions introduce the temporary for you.
\par Example:
\code
typedef tvmet::Matrix<double, 3, 3> matrix_type;
matrix_type M;
std::generate(M.begin(), M.end(),
tvmet::util::Incrementor<matrix_type::value_type>());
std::cout << "M = " << M << std::endl;
alias(M) = trans(M);
std::cout << "M = " << M << std::endl;
\endcode
with the expected
\par Output:
\code
M = [
[1, 4, 7],
[2, 5, 8],
[3, 6, 9]
]
\endcode
These function/proxy will work for the element wise operators +=, -=, *= and /=
with expressions, e.g. as trans() returns.
\sa \ref assign_op
\section spec_meta_func ... special Meta-Template Functions
From a principle point of view, there is no need for some special functions
for %Matrix and %Vector functions, namely \f$M^T\, x\f$, \f$M^T\,M\f$,
\f$M\,M^T\f$, and \f$(M\,M)^T\f$.
Unfortunately, the g++ compiler throws in the towel sometimes even on
transposing matrices. Because of this, %tvmet offers specialized functions
which speed up at runtime (about factor 2 ... 3) using meta templates.
\par Example:
\code
using namespace tvmet;
Matrix<double, 6, 3> M1(0); // will be transposed to be conform to vector size
Vector<double, 6> v1(0);
Vector<double, 3> v2(0);
M1 = ...
v1 = ...
v2 = Mtx_prod(M1, v1); // equal to: v2 = trans(M1)*v1;
\endcode
BTW, the %Matrix-%Matrix \f$M\,M\f$ and %Matrix-%Vector \f$M\,x\f$
products use Meta-Templates, too.
\sa \ref Mtx_prod
\sa \ref MMt_prod
\sa \ref MtM_prod
\sa \ref trans_prod
\section mmv ... about Matrix-Matrix-Vector and Matrix-Matrix-Matrix-operations
The problem is related to the optimizer - due to the expression and meta
templates used.
Internally, an expression template may contain other expression templates
(meta templates inside as well as) too - the compiler will unroll all of
these expression into a single resultant expression (which is a hard job).
Sometimes the code generated from this is worse (from a performance point
of view) than just using simple temporaries.
You can chain matrix-matrix and matrix-vector operations without writing
temporaries by yourself (if this is what you want).
\par from examples/hspiess.cc:
\code
tvmet::Matrix<double,3,2> B;
tvmet::Matrix<double,3,3> D;
tvmet::Matrix<double,2,2> K;
B =
-0.05, 0,
0, 0.05,
0.05, -0.05;
D =
2000, 1000, 0,
1000, 2000, 0,
0, 0, 500;
K = trans(B) * D * B;
\endcode
The performance can be sub optimal due to the increasing complexity
of operations. This can be reduced by a user specified temporary:
\par from examples/hspiess.cc:
\code
// as before
K = tvmet::Matrix<double,2,3>(trans(B) * D) * B;
\endcode
or
\code
K = prod(tvmet::Matrix<double,2,3>(prod(trans(B), D)), B);
\endcode
At this moment an intelligent cache and pre-evaluating strategy is
missing by %tvmet.
\sa \ref spec_meta_func
\sa some notes \ref temporaries
*/
// Local Variables:
// mode:c++
// End:
|