diff options
Diffstat (limited to 'cil/doc/ext.html')
-rw-r--r-- | cil/doc/ext.html | 506 |
1 files changed, 0 insertions, 506 deletions
diff --git a/cil/doc/ext.html b/cil/doc/ext.html deleted file mode 100644 index 532e225..0000000 --- a/cil/doc/ext.html +++ /dev/null @@ -1,506 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" - "http://www.w3.org/TR/REC-html40/loose.dtd"> -<HTML> -<HEAD> - - - -<META http-equiv="Content-Type" content="text/html; charset=ANSI_X3.4-1968"> -<META name="GENERATOR" content="hevea 1.08"> - -<base target="main"> -<script language="JavaScript"> -<!-- Begin -function loadTop(url) { - parent.location.href= url; -} -// --> -</script> -<LINK rel="stylesheet" type="text/css" href="cil.css"> -<TITLE> -Library of CIL Modules -</TITLE> -</HEAD> -<BODY > -<A HREF="cil007.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A> -<A HREF="ciltoc.html"><IMG SRC ="contents_motif.gif" ALT="Up"></A> -<A HREF="cil009.html"><IMG SRC ="next_motif.gif" ALT="Next"></A> -<HR> - -<H2 CLASS="section"><A NAME="htoc17">8</A> Library of CIL Modules</H2> <A NAME="sec-Extension"></A><BR> -<BR> -We are developing a suite of modules that use CIL for program analyses and -transformations that we have found useful. You can use these modules directly -on your code, or generally as inspiration for writing similar modules. A -particularly big and complex application written on top of CIL is CCured -(<A HREF="../ccured/index.html"><TT>../ccured/index.html</TT></A>).<BR> -<BR> -<A NAME="toc9"></A> -<H3 CLASS="subsection"><A NAME="htoc18">8.1</A> Control-Flow Graphs</H3> <A NAME="sec-cfg"></A> -The <A HREF="api/Cil.html#TYPEstmt">Cil.stmt</A> datatype includes fields for intraprocedural -control-flow information: the predecessor and successor statements of -the current statement. This information is not computed by default. -If you want to use the control-flow graph, or any of the extensions in -this section that require it, you have to explicitly ask CIL to -compute the CFG.<BR> -<BR> - -<H4 CLASS="subsubsection"><A NAME="htoc19">8.1.1</A> The CFG module (new in CIL 1.3.5)</H4> -The best way to compute the CFG is with the CFG module. Just invoke -<A HREF="api/Cfg.html#VALcomputeFileCFG">Cfg.computeFileCFG</A> on your file. The <A HREF="api/Cfg.html">Cfg</A> API -describes the rest of actions you can take with this module, including -computing the CFG for one function at a time, or printing the CFG in -<TT>dot</TT> form.<BR> -<BR> - -<H4 CLASS="subsubsection"><A NAME="htoc20">8.1.2</A> Simplified control flow</H4> -CIL can reduce high-level C control-flow constructs like <TT>switch</TT> and -<TT>continue</TT> to lower-level <TT>goto</TT>s. This completely eliminates some -possible classes of statements from the program and may make the result -easier to analyze (e.g., it simplifies data-flow analysis).<BR> -<BR> -You can invoke this transformation on the command line with -<TT>--domakeCFG</TT> or programatically with <A HREF="api/Cil.html#VALprepareCFG">Cil.prepareCFG</A>. -After calling Cil.prepareCFG, you can use <A HREF="api/Cil.html#VALcomputeCFGInfo">Cil.computeCFGInfo</A> -to compute the CFG information and find the successor and predecessor -of each statement.<BR> -<BR> -For a concrete example, you can see how <TT>cilly --domakeCFG</TT> -transforms the following code (note the fall-through in case 1): -<PRE CLASS="verbatim"><FONT COLOR=blue> - int foo (int predicate) { - int x = 0; - switch (predicate) { - case 0: return 111; - case 1: x = x + 1; - case 2: return (x+3); - case 3: break; - default: return 222; - } - return 333; - } -</FONT></PRE> -See the <A HREF="examples/ex23.txt">CIL output</A> for this -code fragment<BR> -<BR> -<A NAME="toc10"></A> -<H3 CLASS="subsection"><A NAME="htoc21">8.2</A> Data flow analysis framework</H3> -The <A HREF="api/Dataflow.html">Dataflow</A> module (click for the ocamldoc) contains a -parameterized framework for forward and backward data flow -analyses. You provide the transfer functions and this module does the -analysis. You must compute control-flow information (Section <A HREF="#sec-cfg">8.1</A>) -before invoking the Dataflow module.<BR> -<BR> -<A NAME="toc11"></A> -<H3 CLASS="subsection"><A NAME="htoc22">8.3</A> Dominators</H3> -The module <A HREF="api/Dominators.html">Dominators</A> contains the computation of immediate - dominators. It uses the <A HREF="api/Dataflow.html">Dataflow</A> module. <BR> -<BR> -<A NAME="toc12"></A> -<H3 CLASS="subsection"><A NAME="htoc23">8.4</A> Points-to Analysis</H3> -The module <TT>ptranal.ml</TT> contains two interprocedural points-to -analyses for CIL: <TT>Olf</TT> and <TT>Golf</TT>. <TT>Olf</TT> is the default. -(Switching from <TT>olf.ml</TT> to <TT>golf.ml</TT> requires a change in -<TT>Ptranal</TT> and a recompiling <TT>cilly</TT>.)<BR> -<BR> -The analyses have the following characteristics: -<UL CLASS="itemize"><LI CLASS="li-itemize"> -Not based on C types (inferred pointer relationships are sound - despite most kinds of C casts) -<LI CLASS="li-itemize">One level of subtyping -<LI CLASS="li-itemize">One level of context sensitivity (Golf only) -<LI CLASS="li-itemize">Monomorphic type structures -<LI CLASS="li-itemize">Field insensitive (fields of structs are conflated) -<LI CLASS="li-itemize">Demand-driven (points-to queries are solved on demand) -<LI CLASS="li-itemize">Handle function pointers -</UL> -The analysis itself is factored into two components: <TT>Ptranal</TT>, -which walks over the CIL file and generates constraints, and <TT>Olf</TT> -or <TT>Golf</TT>, which solve the constraints. The analysis is invoked -with the function <TT>Ptranal.analyze_file: Cil.file -> - unit</TT>. This function builds the points-to graph for the CIL file -and stores it internally. There is currently no facility for clearing -internal state, so <TT>Ptranal.analyze_file</TT> should only be called -once.<BR> -<BR> -The constructed points-to graph supports several kinds of queries, -including alias queries (may two expressions be aliased?) and -points-to queries (to what set of locations may an expression point?).<BR> -<BR> -The main interface with the alias analysis is as follows: -<UL CLASS="itemize"><LI CLASS="li-itemize"> -<TT>Ptranal.may_alias: Cil.exp -> Cil.exp -> bool</TT>. If - <TT>true</TT>, the two expressions may have the same value. -<LI CLASS="li-itemize"><TT>Ptranal.resolve_lval: Cil.lval -> (Cil.varinfo - list)</TT>. Returns the list of variables to which the given - left-hand value may point. -<LI CLASS="li-itemize"><TT>Ptranal.resolve_exp: Cil.exp -> (Cil.varinfo list)</TT>. - Returns the list of variables to which the given expression may - point. -<LI CLASS="li-itemize"><TT>Ptranal.resolve_funptr: Cil.exp -> (Cil.fundec - list)</TT>. Returns the list of functions to which the given - expression may point. -</UL> -The precision of the analysis can be customized by changing the values -of several flags: -<UL CLASS="itemize"><LI CLASS="li-itemize"> -<TT>Ptranal.no_sub: bool ref</TT>. - If <TT>true</TT>, subtyping is disabled. Associated commandline option: - <B>--ptr_unify</B>. -<LI CLASS="li-itemize"><TT>Ptranal.analyze_mono: bool ref</TT>. - (Golf only) If <TT>true</TT>, context sensitivity is disabled and the - analysis is effectively monomorphic. Commandline option: - <B>--ptr_mono</B>. -<LI CLASS="li-itemize"><TT>Ptranal.smart_aliases: bool ref</TT>. - (Golf only) If <TT>true</TT>, “smart” disambiguation of aliases is - enabled. Otherwise, aliases are computed by intersecting points-to - sets. This is an experimental feature. -<LI CLASS="li-itemize"><TT>Ptranal.model_strings: bool ref</TT>. - Make the alias analysis model string constants by treating them as - pointers to chars. Commandline option: <B>--ptr_model_strings</B> -<LI CLASS="li-itemize"><TT>Ptranal.conservative_undefineds: bool ref</TT>. - Make the most pessimistic assumptions about globals if an undefined - function is present. Such a function can write to every global - variable. Commandline option: <B>--ptr_conservative</B> -</UL> -In practice, the best precision/efficiency tradeoff is achieved by -setting <TT>Ptranal.no_sub</TT> to <TT>false</TT>, <TT>Ptranal.analyze_mono</TT> to -<TT>true</TT>, and <TT>Ptranal.smart_aliases</TT> to <TT>false</TT>. These are the -default values of the flags.<BR> -<BR> -There are also a few flags that can be used to inspect or serialize -the results of the analysis. -<UL CLASS="itemize"><LI CLASS="li-itemize"> -<TT>Ptranal.debug_may_aliases</TT>. - Print the may-alias relationship of each pair of expressions in the - program. Commandline option: <B>--ptr_may_aliases</B>. -<LI CLASS="li-itemize"><TT>Ptranal.print_constraints: bool ref</TT>. - If <TT>true</TT>, the analysis will print each constraint as it is - generated. -<LI CLASS="li-itemize"><TT>Ptranal.print_types: bool ref</TT>. - If <TT>true</TT>, the analysis will print the inferred type of each - variable in the program.<BR> -<BR> -If <TT>Ptranal.analyze_mono</TT> and <TT>Ptranal.no_sub</TT> are both - <TT>true</TT>, this output is sufficient to reconstruct the points-to - graph. One nice feature is that there is a pretty printer for - recursive types, so the print routine does not loop. -<LI CLASS="li-itemize"><TT>Ptranal.compute_results: bool ref</TT>. - If <TT>true</TT>, the analysis will print out the points-to set of each - variable in the program. This will essentially serialize the - points-to graph. -</UL> -<A NAME="toc13"></A> -<H3 CLASS="subsection"><A NAME="htoc24">8.5</A> StackGuard</H3> -The module <TT>heapify.ml</TT> contains a transformation similar to the one -described in “StackGuard: Automatic Adaptive Detection and Prevention of -Buffer-Overflow Attacks”, <EM>Proceedings of the 7th USENIX Security -Conference</EM>. In essence it modifies the program to maintain a separate -stack for return addresses. Even if a buffer overrun attack occurs the -actual correct return address will be taken from the special stack. <BR> -<BR> -Although it does work, this CIL module is provided mainly as an example of -how to perform a simple source-to-source program analysis and -transformation. As an optimization only functions that contain a dangerous -local array make use of the special return address stack. <BR> -<BR> -For a concrete example, you can see how <TT>cilly --dostackGuard</TT> -transforms the following dangerous code: -<PRE CLASS="verbatim"><FONT COLOR=blue> - int dangerous() { - char array[10]; - scanf("%s",array); // possible buffer overrun! - } - - int main () { - return dangerous(); - } -</FONT></PRE> -See the <A HREF="examples/ex24.txt">CIL output</A> for this -code fragment<BR> -<BR> -<A NAME="toc14"></A> -<H3 CLASS="subsection"><A NAME="htoc25">8.6</A> Heapify</H3> -The module <TT>heapify.ml</TT> also contains a transformation that moves all -dangerous local arrays to the heap. This also prevents a number of buffer -overruns. <BR> -<BR> -For a concrete example, you can see how <TT>cilly --doheapify</TT> -transforms the following dangerous code: -<PRE CLASS="verbatim"><FONT COLOR=blue> - int dangerous() { - char array[10]; - scanf("%s",array); // possible buffer overrun! - } - - int main () { - return dangerous(); - } -</FONT></PRE> -See the <A HREF="examples/ex25.txt">CIL output</A> for this -code fragment<BR> -<BR> -<A NAME="toc15"></A> -<H3 CLASS="subsection"><A NAME="htoc26">8.7</A> One Return</H3> -The module <TT>oneret.ml</TT> contains a transformation the ensures that all -function bodies have at most one return statement. This simplifies a number -of analyses by providing a canonical exit-point. <BR> -<BR> -For a concrete example, you can see how <TT>cilly --dooneRet</TT> -transforms the following code: -<PRE CLASS="verbatim"><FONT COLOR=blue> - int foo (int predicate) { - if (predicate <= 0) { - return 1; - } else { - if (predicate > 5) - return 2; - return 3; - } - } -</FONT></PRE> -See the <A HREF="examples/ex26.txt">CIL output</A> for this -code fragment<BR> -<BR> -<A NAME="toc16"></A> -<H3 CLASS="subsection"><A NAME="htoc27">8.8</A> Partial Evaluation and Constant Folding</H3> -The <TT>partial.ml</TT> module provides a simple interprocedural partial -evaluation and constant folding data-flow analysis and transformation. This -transformation requires the <TT>--domakeCFG</TT> option. <BR> -<BR> -For a concrete example, you can see how <TT>cilly --domakeCFG --dopartial</TT> -transforms the following code (note the eliminated <TT>if</TT> branch and the -partial optimization of <TT>foo</TT>): -<PRE CLASS="verbatim"><FONT COLOR=blue> - int foo(int x, int y) { - int unknown; - if (unknown) - return y+2; - return x+3; - } - - int main () { - int a,b,c; - a = foo(5,7) + foo(6,7); - b = 4; - c = b * b; - if (b > c) - return b-c; - else - return b+c; - } -</FONT></PRE> -See the <A HREF="examples/ex27.txt">CIL output</A> for this -code fragment<BR> -<BR> -<A NAME="toc17"></A> -<H3 CLASS="subsection"><A NAME="htoc28">8.9</A> Reaching Definitions</H3> -The <TT>reachingdefs.ml</TT> module uses the dataflow framework and CFG -information to calculate the definitions that reach each -statement. After computing the CFG (Section <A HREF="#sec-cfg">8.1</A>) and calling -<TT>computeRDs</TT> on a -function declaration, <TT>ReachingDef.stmtStartData</TT> will contain a -mapping from statement IDs to data about which definitions reach each -statement. In particular, it is a mapping from statement IDs to a -triple the first two members of which are used internally. The third -member is a mapping from variable IDs to Sets of integer options. If -the set contains <TT>Some(i)</TT>, then the definition of that variable -with ID <TT>i</TT> reaches that statement. If the set contains <TT>None</TT>, -then there is a path to that statement on which there is no definition -of that variable. Also, if the variable ID is unmapped at a -statement, then no definition of that variable reaches that statement.<BR> -<BR> -To summarize, reachingdefs.ml has the following interface: -<UL CLASS="itemize"><LI CLASS="li-itemize"> -<TT>computeRDs</TT> – Computes reaching definitions. Requires that -CFG information has already been computed for each statement. -<LI CLASS="li-itemize"><TT>ReachingDef.stmtStartData</TT> – contains reaching -definition data after <TT>computeRDs</TT> is called. -<LI CLASS="li-itemize"><TT>ReachingDef.defIdStmtHash</TT> – Contains a mapping -from definition IDs to the ID of the statement in which -the definition occurs. -<LI CLASS="li-itemize"><TT>getRDs</TT> – Takes a statement ID and returns -reaching definition data for that statement. -<LI CLASS="li-itemize"><TT>instrRDs</TT> – Takes a list of instructions and the -definitions that reach the first instruction, and for -each instruction calculates the definitions that reach -either into or out of that instruction. -<LI CLASS="li-itemize"><TT>rdVisitorClass</TT> – A subclass of nopCilVisitor that -can be extended such that the current reaching definition -data is available when expressions are visited through -the <TT>get_cur_iosh</TT> method of the class. -</UL> -<A NAME="toc18"></A> -<H3 CLASS="subsection"><A NAME="htoc29">8.10</A> Available Expressions</H3> -The <TT>availexps.ml</TT> module uses the dataflow framework and CFG -information to calculate something similar to a traditional available -expressions analysis. After <TT>computeAEs</TT> is called following a CFG -calculation (Section <A HREF="#sec-cfg">8.1</A>), <TT>AvailableExps.stmtStartData</TT> will -contain a mapping -from statement IDs to data about what expressions are available at -that statement. The data for each statement is a mapping for each -variable ID to the whole expression available at that point(in the -traditional sense) which the variable was last defined to be. So, -this differs from a traditional available expressions analysis in that -only whole expressions from a variable definition are considered rather -than all expressions.<BR> -<BR> -The interface is as follows: -<UL CLASS="itemize"><LI CLASS="li-itemize"> -<TT>computeAEs</TT> – Computes available expressions. Requires -that CFG information has already been comptued for each statement. -<LI CLASS="li-itemize"><TT>AvailableExps.stmtStartData</TT> – Contains available -expressions data for each statement after <TT>computeAEs</TT> has been -called. -<LI CLASS="li-itemize"><TT>getAEs</TT> – Takes a statement ID and returns -available expression data for that statement. -<LI CLASS="li-itemize"><TT>instrAEs</TT> – Takes a list of instructions and -the availalbe expressions at the first instruction, and -for each instruction calculates the expressions available -on entering or exiting each instruction. -<LI CLASS="li-itemize"><TT>aeVisitorClass</TT> – A subclass of nopCilVisitor that -can be extended such that the current available expressions -data is available when expressions are visited through the -<TT>get_cur_eh</TT> method of the class. -</UL> -<A NAME="toc19"></A> -<H3 CLASS="subsection"><A NAME="htoc30">8.11</A> Liveness Analysis</H3> -The <TT>liveness.ml</TT> module uses the dataflow framework and -CFG information to calculate which variables are live at -each program point. After <TT>computeLiveness</TT> is called -following a CFG calculation (Section <A HREF="#sec-cfg">8.1</A>), <TT>LiveFlow.stmtStartData</TT> will -contain a mapping for each statement ID to a set of <TT>varinfo</TT>s -for varialbes live at that program point.<BR> -<BR> -The interface is as follows: -<UL CLASS="itemize"><LI CLASS="li-itemize"> -<TT>computeLiveness</TT> – Computes live variables. Requires -that CFG information has already been computed for each statement. -<LI CLASS="li-itemize"><TT>LiveFlow.stmtStartData</TT> – Contains live variable data -for each statement after <TT>computeLiveness</TT> has been called. -</UL> -Also included in this module is a command line interface that -will cause liveness data to be printed to standard out for -a particular function or label. -<UL CLASS="itemize"><LI CLASS="li-itemize"> -<TT>–doliveness</TT> – Instructs cilly to comptue liveness -information and to print on standard out the variables live -at the points specified by <TT>–live_func</TT> and <TT>live_label</TT>. -If both are ommitted, then nothing is printed. -<LI CLASS="li-itemize"><TT>–live_func</TT> – The name of the function whose -liveness data is of interest. If <TT>–live_label</TT> is ommitted, -then data for each statement is printed. -<LI CLASS="li-itemize"><TT>–live_label</TT> – The name of the label at which -the liveness data will be printed. -</UL> -<A NAME="toc20"></A> -<H3 CLASS="subsection"><A NAME="htoc31">8.12</A> Dead Code Elimination</H3> -The module <TT>deadcodeelim.ml</TT> uses the reaching definitions -analysis to eliminate assignment instructions whose results -are not used. The interface is as follows: -<UL CLASS="itemize"><LI CLASS="li-itemize"> -<TT>elim_dead_code</TT> – Performs dead code elimination -on a function. Requires that CFG information has already -been computed (Section <A HREF="#sec-cfg">8.1</A>). -<LI CLASS="li-itemize"><TT>dce</TT> – Performs dead code elimination on an -entire file. Requires that CFG information has already -been computed. -</UL> -<A NAME="toc21"></A> -<H3 CLASS="subsection"><A NAME="htoc32">8.13</A> Simple Memory Operations</H3> -The <TT>simplemem.ml</TT> module allows CIL lvalues that contain memory -accesses to be even futher simplified via the introduction of -well-typed temporaries. After this transformation all lvalues involve -at most one memory reference.<BR> -<BR> -For a concrete example, you can see how <TT>cilly --dosimpleMem</TT> -transforms the following code: -<PRE CLASS="verbatim"><FONT COLOR=blue> - int main () { - int ***three; - int **two; - ***three = **two; - } -</FONT></PRE> -See the <A HREF="examples/ex28.txt">CIL output</A> for this -code fragment<BR> -<BR> -<A NAME="toc22"></A> -<H3 CLASS="subsection"><A NAME="htoc33">8.14</A> Simple Three-Address Code</H3> -The <TT>simplify.ml</TT> module further reduces the complexity of program -expressions and gives you a form of three-address code. After this -transformation all expressions will adhere to the following grammar: -<PRE CLASS="verbatim"> - basic::= - Const _ - Addrof(Var v, NoOffset) - StartOf(Var v, NoOffset) - Lval(Var v, off), where v is a variable whose address is not taken - and off contains only "basic" - - exp::= - basic - Lval(Mem basic, NoOffset) - BinOp(bop, basic, basic) - UnOp(uop, basic) - CastE(t, basic) - - lval ::= - Mem basic, NoOffset - Var v, off, where v is a variable whose address is not taken and off - contains only "basic" -</PRE>In addition, all <TT>sizeof</TT> and <TT>alignof</TT> forms are turned into -constants. Accesses to arrays and variables whose address is taken are -turned into "Mem" accesses. All field and index computations are turned -into address arithmetic.<BR> -<BR> -For a concrete example, you can see how <TT>cilly --dosimplify</TT> -transforms the following code: -<PRE CLASS="verbatim"><FONT COLOR=blue> - int main() { - struct mystruct { - int a; - int b; - } m; - int local; - int arr[3]; - int *ptr; - - ptr = &local; - m.a = local + sizeof(m) + arr[2]; - return m.a; - } -</FONT></PRE> -See the <A HREF="examples/ex29.txt">CIL output</A> for this -code fragment<BR> -<BR> -<A NAME="toc23"></A> -<H3 CLASS="subsection"><A NAME="htoc34">8.15</A> Converting C to C++</H3> -The module canonicalize.ml performs several transformations to correct -differences between C and C++, so that the output is (hopefully) valid -C++ code. This may be incomplete — certain fixes which are necessary -for some programs are not yet implemented.<BR> -<BR> -Using the <TT>--doCanonicalize</TT> option with CIL will perform the -following changes to your program: -<OL CLASS="enumerate" type=1><LI CLASS="li-enumerate"> -Any variables that use C++ keywords as identifiers are renamed. -<LI CLASS="li-enumerate">C allows global variables to have multiple declarations and - multiple (equivalent) definitions. This transformation removes - all but one declaration and all but one definition. -<LI CLASS="li-enumerate"><TT>__inline</TT> is #defined to <TT>inline</TT>, and <TT>__restrict</TT> - is #defined to nothing. -<LI CLASS="li-enumerate">C allows function pointers with no specified arguments to be used on - any argument list. To make C++ accept this code, we insert a cast - from the function pointer to a type that matches the arguments. Of - course, this does nothing to guarantee that the pointer actually has - that type. -<LI CLASS="li-enumerate">Makes casts from int to enum types explicit. (CIL changes enum - constants to int constants, but doesn't use a cast.) -</OL> -<HR> -<A HREF="cil007.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A> -<A HREF="ciltoc.html"><IMG SRC ="contents_motif.gif" ALT="Up"></A> -<A HREF="cil009.html"><IMG SRC ="next_motif.gif" ALT="Next"></A> -</BODY> -</HTML> |