1 Introduction
New: CIL now has a Source Forge page:
http://sourceforge.net/projects/cil.
CIL (C Intermediate Language) is a high-level representation
along with a set of tools that permit easy analysis and source-to-source
transformation of C programs.
CIL is both lower-level than abstract-syntax trees, by clarifying ambiguous
constructs and removing redundant ones, and also higher-level than typical
intermediate languages designed for compilation, by maintaining types and a
close relationship with the source program. The main advantage of CIL is that
it compiles all valid C programs into a few core constructs with a very clean
semantics. Also CIL has a syntax-directed type system that makes it easy to
analyze and manipulate C programs. Furthermore, the CIL front-end is able to
process not only ANSI-C programs but also those using Microsoft C or GNU C
extensions. If you do not use CIL and want instead to use just a C parser and
analyze programs expressed as abstract-syntax trees then your analysis will
have to handle a lot of ugly corners of the language (let alone the fact that
parsing C itself is not a trivial task). See Section 16 for some
examples of such extreme programs that CIL simplifies for you.
In essence, CIL is a highly-structured, “clean” subset of C. CIL features a
reduced number of syntactic and conceptual forms. For example, all looping
constructs are reduced to a single form, all function bodies are given
explicit return statements, syntactic sugar like "->" is
eliminated and function arguments with array types become pointers. (For an
extensive list of how CIL simplifies C programs, see Section 4.)
This reduces the number of cases that must be considered when manipulating a C
program. CIL also separates type declarations from code and flattens scopes
within function bodies. This structures the program in a manner more amenable
to rapid analysis and transformation. CIL computes the types of all program
expressions, and makes all type promotions and casts explicit. CIL supports
all GCC and MSVC extensions except for nested functions and complex numbers.
Finally, CIL organizes C's imperative features into expressions, instructions
and statements based on the presence and absence of side-effects and
control-flow. Every statement can be annotated with successor and predecessor
information. Thus CIL provides an integrated program representation that can
be used with routines that require an AST (e.g. type-based analyses and
pretty-printers), as well as with routines that require a CFG (e.g., dataflow
analyses). CIL also supports even lower-level representations (e.g.,
three-address code), see Section 8.
CIL comes accompanied by a number of Perl scripts that perform generally
useful operations on code:
-
A driver which behaves as either the gcc or
Microsoft VC compiler and can invoke the preprocessor followed by the CIL
application. The advantage of this script is that you can easily use CIL and
the analyses written for CIL with existing make files.
- A whole-program merger that you can use as a
replacement for your compiler and it learns all the files you compile when you
make a project and merges all of the preprocessed source files into a single
one. This makes it easy to do whole-program analysis.
- A patcher makes it easy to create modified
copies of the system include files. The CIL driver can then be told to use
these patched copies instead of the standard ones.
CIL has been tested very extensively. It is able to process the SPECINT95
benchmarks, the Linux kernel, GIMP and other open-source projects. All of
these programs are compiled to the simple CIL and then passed to gcc and
they still run! We consider the compilation of Linux a major feat especially
since Linux contains many of the ugly GCC extensions (see Section 16.2).
This adds to about 1,000,000 lines of code that we tested it on. It is also
able to process the few Microsoft NT device drivers that we have had access
to. CIL was tested against GCC's c-torture testsuite and (except for the tests
involving complex numbers and inner functions, which CIL does not currently
implement) CIL passes most of the tests. Specifically CIL fails 23 tests out
of the 904 c-torture tests that it should pass. GCC itself fails 19 tests. A
total of 1400 regression test cases are run automatically on each change to
the CIL sources.
CIL is relatively independent on the underlying machine and compiler. When
you build it CIL will configure itself according to the underlying compiler.
However, CIL has only been tested on Intel x86 using the gcc compiler on Linux
and cygwin and using the MS Visual C compiler. (See below for specific
versions of these compilers that we have used CIL for.)
The largest application we have used CIL for is
CCured, a compiler that compiles C code into
type-safe code by analyzing your pointer usage and inserting runtime checks in
the places that cannot be guaranteed statically to be type safe.
You can also use CIL to “compile” code that uses GCC extensions (e.g. the
Linux kernel) into standard C code.
CIL also comes accompanies by a growing library of extensions (see
Section 8). You can use these for your projects or as examples of
using CIL.
PDF versions of this manual and the
CIL API are available. However, we recommend the
HTML versions because the postprocessed code examples are easier to
view.
If you use CIL in your project, we would appreciate letting us know. If you
want to cite CIL in your research writings, please refer to the paper “CIL:
Intermediate Language and Tools for Analysis and Transformation of C
Programs” by George C. Necula, Scott McPeak, S.P. Rahul and Westley Weimer,
in “Proceedings of Conference on Compilier Construction”, 2002.