summaryrefslogtreecommitdiff
path: root/cil/doc/cil016.html
blob: 3191a9d572e22dfac214910d7523d3ffbd8f86b0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
            "http://www.w3.org/TR/REC-html40/loose.dtd">
<HTML>
<HEAD>



<META http-equiv="Content-Type" content="text/html; charset=ANSI_X3.4-1968">
<META name="GENERATOR" content="hevea 1.08">

<base target="main">
<script language="JavaScript">
<!-- Begin
function loadTop(url) {
  parent.location.href= url;
}
// -->
</script>
<LINK rel="stylesheet" type="text/css" href="cil.css">
<TITLE>
Who Says C is Simple?
</TITLE>
</HEAD>
<BODY >
<A HREF="cil015.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A>
<A HREF="ciltoc.html"><IMG SRC ="contents_motif.gif" ALT="Up"></A>
<A HREF="cil017.html"><IMG SRC ="next_motif.gif" ALT="Next"></A>
<HR>

<H2 CLASS="section"><A NAME="htoc42">16</A>&nbsp;&nbsp;Who Says C is Simple?</H2><A NAME="sec-simplec"></A>
When I (George) started to write CIL I thought it was going to take two weeks.
Exactly a year has passed since then and I am still fixing bugs in it. This
gross underestimate was due to the fact that I thought parsing and making
sense of C is simple. You probably think the same. What I did not expect was
how many dark corners this language has, especially if you want to parse
real-world programs such as those written for GCC or if you are more ambitious
and you want to parse the Linux or Windows NT sources (both of these were
written without any respect for the standard and with the expectation that
compilers will be changed to accommodate the program). <BR>
<BR>
The following examples were actually encountered either in real programs or
are taken from the ISO C99 standard or from the GCC's testcases. My first
reaction when I saw these was: <EM>Is this C?</EM>. The second one was : <EM>What the hell does it mean?</EM>. <BR>
<BR>
If you are contemplating doing program analysis for C on abstract-syntax
trees then your analysis ought to be able to handle these things. Or, you can
use CIL and let CIL translate them into clean C code. <BR>
<BR>
<A NAME="toc24"></A>
<H3 CLASS="subsection"><A NAME="htoc43">16.1</A>&nbsp;&nbsp;Standard C</H3>
<OL CLASS="enumerate" type=1><LI CLASS="li-enumerate">Why does the following code return 0 for most values of <TT>x</TT>? (This
should be easy.)
<PRE CLASS="verbatim"><FONT COLOR=blue>
  int x;
  return x == (1 &amp;&amp; x);
</FONT></PRE>
See the <A HREF="examples/ex30.txt">CIL output</A> for this
code fragment<BR>
<BR>
<LI CLASS="li-enumerate">Why does the following code return 0 and not -1? (Answer: because
<TT>sizeof</TT> is unsigned, thus the result of the subtraction is unsigned, thus
the shift is logical.)
<PRE CLASS="verbatim"><FONT COLOR=blue>
 return ((1 - sizeof(int)) &gt;&gt; 32);
</FONT></PRE>
See the <A HREF="examples/ex31.txt">CIL output</A> for this
code fragment<BR>
<BR>
<LI CLASS="li-enumerate">Scoping rules can be tricky. This function returns 5.
<PRE CLASS="verbatim"><FONT COLOR=blue>
int x = 5;
int f() {
  int x = 3;
  {
    extern int x;
    return x;
  }
}
</FONT></PRE>
See the <A HREF="examples/ex32.txt">CIL output</A> for this
code fragment<BR>
<BR>
<LI CLASS="li-enumerate">Functions and function pointers are implicitly converted to each other. 
<PRE CLASS="verbatim"><FONT COLOR=blue>
int (*pf)(void);
int f(void) {

   pf = &amp;f; // This looks ok
   pf = ***f; // Dereference a function?
   pf(); // Invoke a function pointer?     
   (****pf)();  // Looks strange but Ok
   (***************f)(); // Also Ok             
}
</FONT></PRE>
See the <A HREF="examples/ex33.txt">CIL output</A> for this
code fragment<BR>
<BR>
<LI CLASS="li-enumerate">Initializer with designators are one of the hardest parts about ISO C.
Neither MSVC or GCC implement them fully. GCC comes close though. What is the
final value of <TT>i.nested.y</TT> and <TT>i.nested.z</TT>? (Answer: 2 and respectively
6). 
<PRE CLASS="verbatim"><FONT COLOR=blue>
struct { 
   int x; 
   struct { 
       int y, z; 
   } nested;
} i = { .nested.y = 5, 6, .x = 1, 2 };               
</FONT></PRE>
See the <A HREF="examples/ex34.txt">CIL output</A> for this
code fragment<BR>
<BR>
<LI CLASS="li-enumerate">This is from c-torture. This function returns 1.
<PRE CLASS="verbatim"><FONT COLOR=blue>
typedef struct
{
  char *key;
  char *value;
} T1;

typedef struct
{
  long type;
  char *value;
} T3;

T1 a[] =
{
  {
    "",
    ((char *)&amp;((T3) {1, (char *) 1}))
  }
};
int main() {
   T3 *pt3 = (T3*)a[0].value;
   return pt3-&gt;value;
}
</FONT></PRE>
See the <A HREF="examples/ex35.txt">CIL output</A> for this
code fragment<BR>
<BR>
<LI CLASS="li-enumerate">Another one with constructed literals. This one is legal according to
the GCC documentation but somehow GCC chokes on (it works in CIL though). This
code returns 2.
<PRE CLASS="verbatim"><FONT COLOR=blue>
 return ((int []){1,2,3,4})[1];
</FONT></PRE>
See the <A HREF="examples/ex36.txt">CIL output</A> for this
code fragment<BR>
<BR>
<LI CLASS="li-enumerate">In the example below there is one copy of &#8220;bar&#8221; and two copies of
 &#8220;pbar&#8221; (static prototypes at block scope have file scope, while for all
 other types they have block scope). 
<PRE CLASS="verbatim"><FONT COLOR=blue>
  int foo() {
     static bar();
     static (*pbar)() = bar;

  }

  static bar() { 
    return 1;
  }

  static (*pbar)() = 0;
</FONT></PRE>
See the <A HREF="examples/ex37.txt">CIL output</A> for this
code fragment<BR>
<BR>
<LI CLASS="li-enumerate">Two years after heavy use of CIL, by us and others, I discovered a bug
 in the parser. The return value of the following function depends on what
 precedence you give to casts and unary minus:
<PRE CLASS="verbatim"><FONT COLOR=blue>
  unsigned long foo() {
    return (unsigned long) - 1 / 8;
  }
</FONT></PRE>
See the <A HREF="examples/ex38.txt">CIL output</A> for this
code fragment<BR>
<BR>
The correct interpretation is <TT>((unsigned long) - 1) / 8</TT>, which is a
 relatively large number, as opposed to <TT>(unsigned long) (- 1 / 8)</TT>, which
 is 0. </OL>
<A NAME="toc25"></A>
<H3 CLASS="subsection"><A NAME="htoc44">16.2</A>&nbsp;&nbsp;GCC ugliness</H3><A NAME="sec-ugly-gcc"></A>
<OL CLASS="enumerate" type=1><LI CLASS="li-enumerate">GCC has generalized lvalues. You can take the address of a lot of
strange things:
<PRE CLASS="verbatim"><FONT COLOR=blue>
  int x, y, z;
  return &amp;(x ? y : z) - &amp; (x++, x);
</FONT></PRE>
See the <A HREF="examples/ex39.txt">CIL output</A> for this
code fragment<BR>
<BR>
<LI CLASS="li-enumerate">GCC lets you omit the second component of a conditional expression.
<PRE CLASS="verbatim"><FONT COLOR=blue>
  extern int f();
  return f() ? : -1; // Returns the result of f unless it is 0
</FONT></PRE>
See the <A HREF="examples/ex40.txt">CIL output</A> for this
code fragment<BR>
<BR>
<LI CLASS="li-enumerate">Computed jumps can be tricky. CIL compiles them away in a fairly clean
way but you are on your own if you try to jump into another function this way.
<PRE CLASS="verbatim"><FONT COLOR=blue>
static void *jtab[2]; // A jump table
static int doit(int x){
 
  static int jtab_init = 0;
  if(!jtab_init) { // Initialize the jump table
    jtab[0] = &amp;&amp;lbl1;
    jtab[1] = &amp;&amp;lbl2;
    jtab_init = 1;
  }
  goto *jtab[x]; // Jump through the table
lbl1:
  return 0;
lbl2:
  return 1;
}
 
int main(void){
  if (doit(0) != 0) exit(1);
  if (doit(1) != 1) exit(1);
  exit(0);
}
</FONT></PRE>
See the <A HREF="examples/ex41.txt">CIL output</A> for this
code fragment<BR>
<BR>
<LI CLASS="li-enumerate">A cute little example that we made up. What is the returned value?
(Answer: 1); 
<PRE CLASS="verbatim"><FONT COLOR=blue>
 return ({goto L; 0;}) &amp;&amp; ({L: 5;});
</FONT></PRE>
See the <A HREF="examples/ex42.txt">CIL output</A> for this
code fragment<BR>
<BR>
<LI CLASS="li-enumerate"><TT>extern inline</TT> is a strange feature of GNU C. Can you guess what the
following code computes?
<PRE CLASS="verbatim"><FONT COLOR=blue>
extern inline foo(void) { return 1; }
int firstuse(void) { return foo(); }

// A second, incompatible definition of foo
int foo(void) { return 2; }

int main() {
    return foo() + firstuse();
}
</FONT></PRE>
See the <A HREF="examples/ex43.txt">CIL output</A> for this
code fragment<BR>
<BR>
The answer depends on whether the optimizations are turned on. If they are
then the answer is 3 (the first definition is inlined at all occurrences until
the second definition). If the optimizations are off, then the first
definition is ignore (treated like a prototype) and the answer is 4. <BR>
<BR>
CIL will misbehave on this example, if the optimizations are turned off (it
 always returns 3).<BR>
<BR>
<LI CLASS="li-enumerate">GCC allows you to cast an object of a type T into a union as long as the
union has a field of that type:
<PRE CLASS="verbatim"><FONT COLOR=blue>
union u { 
   int i; 
   struct s { 
      int i1, i2;
   } s;
};

union u x = (union u)6;

int main() {
  struct s y = {1, 2};
  union u  z = (union u)y;
}
</FONT></PRE>
See the <A HREF="examples/ex44.txt">CIL output</A> for this
code fragment<BR>
<BR>
<LI CLASS="li-enumerate">GCC allows you to use the <TT>__mode__</TT> attribute to specify the size
of the integer instead of the standard <TT>char</TT>, <TT>short</TT> and so on:
<PRE CLASS="verbatim"><FONT COLOR=blue>
int __attribute__ ((__mode__ (  __QI__ ))) i8;
int __attribute__ ((__mode__ (  __HI__ ))) i16;
int __attribute__ ((__mode__ (  __SI__ ))) i32;
int __attribute__ ((__mode__ (  __DI__ ))) i64;
</FONT></PRE>
See the <A HREF="examples/ex45.txt">CIL output</A> for this
code fragment<BR>
<BR>
<LI CLASS="li-enumerate">The &#8220;alias&#8221; attribute on a function declaration tells the
 linker to treat this declaration as another name for the specified
 function. CIL will replace the declaration with a trampoline
 function pointing to the specified target.
<PRE CLASS="verbatim"><FONT COLOR=blue>
    static int bar(int x, char y) {
      return x + y;
    }

    //foo is considered another name for bar.
    int foo(int x, char y) __attribute__((alias("bar")));
</FONT></PRE>
See the <A HREF="examples/ex46.txt">CIL output</A> for this
code fragment</OL>
<A NAME="toc26"></A>
<H3 CLASS="subsection"><A NAME="htoc45">16.3</A>&nbsp;&nbsp;Microsoft VC ugliness</H3>
This compiler has few extensions, so there is not much to say here.
<OL CLASS="enumerate" type=1><LI CLASS="li-enumerate">
Why does the following code return 0 and not -1? (Answer: because of a
bug in Microsoft Visual C. It thinks that the shift is unsigned just because
the second operator is unsigned. CIL reproduces this bug when in MSVC mode.)
<PRE CLASS="verbatim"><FONT COLOR=blue>
 return -3 &gt;&gt; (8 * sizeof(int));
</FONT></PRE><BR>
<BR>
<LI CLASS="li-enumerate">Unnamed fields in a structure seem really strange at first. It seems
that Microsoft Visual C introduced this extension, then GCC picked it up (but
in the process implemented it wrongly: in GCC the field <TT>y</TT> overlaps with
<TT>x</TT>!).
<PRE CLASS="verbatim"><FONT COLOR=blue>
struct {
  int x;
  struct {
     int y, z;
     struct {
       int u, v;
     };
 };
} a;
return a.x + a.y + a.z + a.u + a.v;
</FONT></PRE>
See the <A HREF="examples/ex47.txt">CIL output</A> for this
code fragment</OL>
<HR>
<A HREF="cil015.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A>
<A HREF="ciltoc.html"><IMG SRC ="contents_motif.gif" ALT="Up"></A>
<A HREF="cil017.html"><IMG SRC ="next_motif.gif" ALT="Next"></A>
</BODY>
</HTML>