diff options
Diffstat (limited to 'doc/Extraction.tex')
-rwxr-xr-x | doc/Extraction.tex | 557 |
1 files changed, 557 insertions, 0 deletions
diff --git a/doc/Extraction.tex b/doc/Extraction.tex new file mode 100755 index 000000000..cb9949d97 --- /dev/null +++ b/doc/Extraction.tex @@ -0,0 +1,557 @@ +\achapter{Execution of extracted programs in Caml and Haskell} +\label{CamlHaskellExtraction} +\aauthor{Benjamin Werner and Jean-Christophe Filliātre} +\index{Extraction} + +It is possible to use \Coq\ to build certified and relatively +efficient programs, extracting them from the proofs of their +specifications. The extracted objects are terms of \FW, and can be +obtained at the \Coq\ toplevel with the command {\tt Extraction} +(see \ref{Extraction}). + +We present here a \Coq\ module, {\tt Extraction}, which translates the +extracted terms to ML dialects, namely Caml Light, Objective Caml and +Haskell. In the following, we will not refer to a particular dialect +when possible and ``ML'' will be used to refer to any of the target +dialects. + +One builds effective programs in an \FW\ toplevel (actually the \Coq\ +toplevel) which contains the extracted objects and in which one can +import ML objects. Indeed, in order to instantiate and realize \Coq\ +type and term variables, it is possible to import ML +objects in the \FW\ toplevel, as inductive types or axioms. + +\Rem +The current mechanism of extraction of effective programs +from \Coq\ proofs slightly differs from the one in the versions of +\Coq\ anterior to the version V5.8. In these versions, there were an +explicit toplevel for the language {\sf Fml}. Moreover, it was not +possible to import ML objects in this {\sf Fml} toplevel. + + +%\section{Extraction facilities} +% +%(* TO DO *) +% + +\medskip +In the first part of this document we describe the commands of the +{\tt Extraction} module, and we give some examples in the second part. + + +\asection{The {\tt Extraction} module} +\label{Extraction} + +This section explains how to import ML objects, to realize axioms and +finally to generate ML code from the extracted programs of \FW. + +These features do not belong to the core system, and appear as an +independent module called {\tt Extraction.v} (which is compiled during the +installation of the system). So the first thing to do is to load this +module: + +\begin{coq_example*} +Require Extraction. +\end{coq_example*} + +\asubsection{Generating executable ML code} +\comindex{Write Caml File} +\comindex{Write CamlLight File} +\comindex{Write Haskell File} +The \Coq\ commands to generate ML code are: +\begin{center}\begin{tabular}{ll} + {\tt Write Caml File "\str" [ \ident$_1$ \dots\ \ident$_n$ ] + {\it options}.} + & ({\em for Objective Caml\/}) \\ + {\tt Write CamlLight File "\str" [ \ident$_1$ \dots\ \ident$_n$ ] + {\it options}.} & \\ + {\tt Write Haskell File "\str" [ \ident$_1$ \dots\ \ident$_n$ ] + {\it options}.} & \\ +\end{tabular}\end{center} +where \str\ is the name given to the file to be produced (the suffix +{\tt .ml} is added if necessary), and \ident$_1$ \dots\ \ident$_n$ the +names of the constants to be extracted. This list does not need to be +exhaustive: it is automatically completed into a complete and minimal +environment. Remaining axioms are translated into exceptions, and a +warning is printed in that case. In particular, this will be the case +for {\tt False\_rec}. (We will see below how to realize axioms). + +\paragraph{Optimizations.} +Since Caml Light and Objective Caml are strict languages, the extracted +code has to be optimized in order to be efficient (for instance, when +using induction principles we do not want to compute all the recursive +calls but only the needed ones). So an optimization routine will be +called each time the user want to generate Caml programs. Essentially, +it performs constants expansions and reductions. Therefore some +constants will not appear in the resulting Caml program (A warning is +printed for each such constant). To avoid this, just put the constant +name in the previous list \ident$_1$ \dots\ \ident$_n$ and it will not +be expanded. Moreover, three options allow the user to control the +expansion strategy : +\begin{description} + \item[\texttt{noopt}] : specifies not to do any optimization. + \item[\texttt{exact}] : specifies to extract exactly the given + objects (no recursivity). + \item[\texttt{expand [ \ident$_1$ \dots\ \ident$_n$ ]}] : + forces the expansion of the constants \ident$_1$ \dots\ \ident$_n$ + (when it is possible). +\end{description} + + +\asubsection{Realizing axioms} +\comindex{Link} + +It is possible to assume some axioms while developing a proof. Since +these axioms can be any kind of proposition or object type, they may +perfectly well have some computational content. But a program must be +a closed term, and of course the system cannot guess the program which +realizes an axiom. Therefore, it is possible to tell the system +what program (an \FW\ term actually) corresponds to a given \Coq\ +axiom. The command is {\tt Link} and the syntax: +$$\mbox{\tt Link \ident\ := Fwterm.}$$ +where \ident\ is the name of the axiom to realize and Fwterm\ the +term which realizes it. The system checks that this term has the same +type as the axiom \ident, and returns an error if not. This command +attaches a body to an axiom, and can be seen as a transformation of an +axiom into a constant. + +These semantical attachments have to be done {\em before} generating +the ML code. All type variables must be realized, and term variables +which are not realized will be translated into exceptions. + +\Example Let us illustrate this feature on a small +example. Assume that you have a type variable {\tt A} of type \Set: + +\begin{coq_example*} +Parameter A : Set. +\end{coq_example*} + +and that your specification proof assumes that there is an order +relation {\em inf} over that type (which has no computational +content), and that this relation is total and decidable: + +\begin{coq_example*} +Parameter inf : A -> A -> Prop. +Axiom inf_total : (x,y:A) {(inf x y)}+{(inf y x)}. +\end{coq_example*} + +Now suppose that we want to use this specification proof on natural +numbers; this means {\tt A} has to be instantiated by {\tt nat} +and the axiom {\tt inf\_total} will be realized, for instance, using the +order relation {\tt le} on that type and the decidability lemma +{\tt le\_lt\_dec}. Here is how to proceed: + +\begin{coq_example*} +Require Compare_dec. +\end{coq_example*} +\begin{coq_example} +Link A := nat. +Link inf_total := le_lt_dec. +\end{coq_example} + +\Warning There is no rollback on the command {\tt Link}, that +is the semantical attachments are not forgotten when doing a {\tt Reset}, +or a {\tt Restore State} command. This will be corrected in a later +version. + +\asubsection{Importing ML objects} +In order to realize axioms and to instantiate programs on real data +types, like {\tt int}, {\tt string}, \dots\ or more complicated data +structures, one want to import existing ML objects in the \FW\ +environment. The system provides such features, through the commands +{\tt ML Import Constant} and {\tt ML Import Inductive}. +The first one imports an ML +object as a new axiom and the second one adds a new inductive +definition corresponding to an ML inductive type. + +\paragraph{Warning.} +In the case of Caml dialects, the system would be able to check the +correctness of the imported objects by looking into the interfaces +files of Caml modules ({\tt .mli} files), but this feature is not yet +implemented. So one must be careful when declaring the types of the +imported objects. + +\paragraph{Caml names.} +When referencing a Caml object, you can use strings instead of +identifiers. Therefore you can use the double +underscore notation \verb!module__name! (Caml Light objects) +or the dot notation \verb!module.name! (Objective Caml objects) +to precise the module in which lies the object. + + +\asubsection{Importing inductive types} +\comindex{ML Import Inductive} + +The \Coq\ command to import an ML inductive type is: +$$\mbox{\tt ML Import Inductive \ident\ [\ident$_1$ \dots\ \ident$_n$] == % +{\em <Inductive Definition>}.}$$ +where \ident\ is the name of the ML type, \ident$_1$ \dots\ +\ident$_n$ the name of its constructors, and {\tt\em<Inductive + Definition>} the corresponding \Coq\ inductive definition +(see \ref{Inductive} in the Reference Manual for the syntax of +inductive definitions). + +This command inserts the {\tt\em<Inductive Definition>} in the \FW\ +environment, without elimination principles. From that moment, it is +possible to use that type like any other \FW\ object, and in +particular to use it to realize axioms. The names \ident\ \ident$_1$ +\dots\ \ident$_n$ may be different from the names given in the +inductive definition, in order to avoid clash with previous +constants, and are restored when generating the ML code. + +\noindent One can also import mutual inductive types with the command: +$$\begin{array}{rl} + \mbox{\tt ML Import Inductive} & + \mbox{\tt\ident$_1$ [\ident$^1_1$ \dots\ \ident$^1_{n_1}$]} \\ + & \dots \\ + & \mbox{\tt\ident$_k$ [\ident$^k_1$ \dots\ \ident$^k_{n_k}$]} \\ + & \qquad \mbox{\tt== {\em<Mutual Inductive Definition>}.} + \end{array}$$ %$$ + +\begin{Examples} +\item Let us show for instance how to import the + type {\tt bool} of Caml Light booleans: + +\begin{coq_example} +ML Import Inductive bool [ true false ] == + Inductive BOOL : Set := TRUE : BOOL + | FALSE : BOOL. +\end{coq_example} + +Here we changed the names because the type {\tt bool} is already +defined in the initial state of \Coq. + + \item Assuming that one defined the mutual inductive types {\tt +tree} and {\tt forest} in a Caml Light module, one can import them +with the command: + +\begin{coq_example} +ML Import Inductive tree [node] forest [empty cons] == + Mutual [A:Set] Inductive + tree : Set := node : A -> (forest A) -> (tree A) + with + forest : Set := empty : (forest A) + | cons : (tree A) -> (forest A) -> (forest A). +\end{coq_example} + + \item One can import the polymorphic type of Caml Light lists with +the command: +\begin{coq_example} +ML Import Inductive list [nil cons] == + Inductive list [A:Set] : Set := nil : (list A) + | cons : A->(list A)->(list A). +\end{coq_example} + +\Rem One would have to re-define {\tt nil} and {\tt cons} at +the top of its program because these constructors have no name in Caml Light. +\end{Examples} + +\asubsection{Importing terms and abstract types} +\comindex{ML Import Constant} + +The other command to import an ML object is: +$$\mbox{\tt ML Import Constant \ident$_{ML}$\ == \ident\ : Fwterm.}$$ +where \ident$_{ML}$\ is the name of the ML object and Fwterm\ its type in +\FW. This command defines an axiom in \FW\ of name \ident\ and type +Fwterm. + +\Example To import the type {\tt int} of Caml Light +integers, and the $<$ binary relation on this type, just do +\begin{coq_example} +ML Import Constant int == int : Set. +ML Import Constant lt_int == lt_int : int -> int -> BOOL. +\end{coq_example} +assuming that the Caml Light type {\tt bool} is already imported (with the +name {\tt BOOL}, as above). + + +\asubsection{Direct use of ML\ objects} +\comindex{Extract Constant} +\comindex{Extract Inductive} + +Sometimes the user do not want to extract \Coq\ objects to new ML code +but wants to use already existing ML objects. For instance, it is the +case for the booleans, which already exist in ML: the user do not want +to extract the \Coq\ inductive type \texttt{bool} to a new type for +booleans, but wants to use the primitive boolean of ML. + +The command \texttt{Extract} fulfills this requirement. +It allows the user to declare constant and inductive types which will not be +extracted but replaced by ML objects. The syntax is the following +$$ +\begin{tabular}{l} + \mbox{\tt Extract Constant \ident\ => \ident'.} \\ + \mbox{\tt Extract Inductive \ident\ + => \ident' [ \ident'$_1$ \dots \ident'$_n$ ].} +\end{tabular} +$$ %$$ +where \ident\/ is the name of the \Coq\ object and the prime identifiers +the name of the corresponding ML objects (the names between brackets +are the names of the constructors). +Mutually recursive types are declared one by one, in any order. + +\Example +Typical examples are the following: +\begin{coq_example} +Extract Inductive unit => unit [ "()" ]. +Extract Inductive bool => bool [ true false ]. +Extract Inductive sumbool => bool [ true false ]. +\end{coq_example} + + +\asubsection{Differences between \Coq\ and ML type systems} + +\subsubsection{ML types that are not \FW\ types} + +Some ML recursive types have no counterpart in the type system of +\Coq, like types using the record construction, or non positive types +like +\begin{verbatim} +# type T = C of T->T;; +\end{verbatim} +In that case, you cannot import those types as inductive types, and +the only way to do is to import them as abstract types (with {\tt ML +Import}) together with the corresponding building and de-structuring +functions (still with {\tt ML Import Constant}). + + +\subsubsection{Programs that are not ML-typable} + +On the contrary, some extracted programs in \FW\ are not typable in +ML. There are in fact two cases which can be problematic: +\begin{itemize} + \item If some part of the program is {\em very} polymorphic, there + may be no ML type for it. In that case the extraction to ML works + all right but the generated code may be refused by the ML + type-checker. A very well known example is the {\em distr-pair} + function: +$$\mbox{\tt +Definition dp := [A,B:Set][x:A][y:B][f:(C:Set)C->C](f A x,f B y). +}$$ +In Caml Light, for instance, the extracted term is +\verb!let dp x y f = pair((f x),(f y))! and has type +$$\mbox{\tt +dp : 'a -> 'a -> ('a -> 'b) -> ('b,'b) prod +}$$ +which is not its original type, but a restriction. + + \item Some definitions of \FW\ may have no counterpart in ML. This + happens when there is a quantification over types inside the type + of a constructor; for example: +$$\mbox{\tt +Inductive anything : Set := dummy : (A:Set)A->anything. +}$$ +which corresponds to the definition of ML dynamics. +\end{itemize} + +The first case is not too problematic: it is still possible to run the +programs by switching off the type-checker during compilation. Unless +you misused the semantical attachment facilities you should never get +any message like ``segmentation fault'' for which the extracted code +would be to blame. To switch off the Caml type-checker, use the +function {\tt obj\_\_magic} which gives the type {\tt 'a} to any +object; but this implies changing a little the extracted code by hand. + +The second case is fatal. If some inductive type cannot be translated +to ML, one has to change the proof (or possibly to ``cheat'' by +some low-level manipulations we would not describe here). + +We have to say, though, that in most ``realistic'' programs, these +problems do not occur. For example all the programs of the library are +accepted by Caml type-checker except {\tt Higman.v}\footnote{Should + you obtain a not ML-typable program out of a self developed example, + we would be interested in seeing it; so please mail us the example at + {\em coq@pauillac.inria.fr}}. + + +\asection{Some examples} + +We present here few examples of extractions, taken from the {\tt +theories} library of \Coq\ (into the {\tt PROGRAMS} directory). We +choose Caml Light as target language, but all can be done in the other +dialects with slight modifications. + + +\asubsection{Euclidean division} + +The file {\tt Euclid\_prog} contains the proof of Euclidean division +(theorem {\tt eucl\_dev}). The natural numbers defined in the example +files are unary integers defined by two constructors $O$ and $S$: +\begin{coq_example*} +Inductive nat : Set := O : nat | S : nat -> nat. +\end{coq_example*} + +To use the proof, we begin by loading the module {\tt Extraction} and the +file into the \Coq\ environment: + +\begin{coq_eval} +Reset Initial. +AddPath "../theories/DEMOS/PROGRAMS". +\end{coq_eval} +\begin{coq_example*} +Require Extraction. +Require Euclid_prog. +\end{coq_example*} + +This module contains a theorem {\tt eucl\_dev}, and its extracted term +is of type {\tt (b:nat)(a:nat) (di\-veucl a b)}, where {\tt diveucl} is a +type for the pair of the quotient and the modulo. +We can now extract this program to Caml Light: + +\begin{coq_example} +Write CamlLight File "euclid" [ eucl_dev ]. +\end{coq_example} + +This produces a file {\tt euclid.ml} containing all the necessary +definitions until {\tt let eucl\_dev = ..}. Let us play the resulting program: + +\begin{verbatim} +# include "euclid";; +# eucl_dev (S (S O)) (S (S (S (S (S O)))));; +- : diveucl = divex (S (S O), S O) +\end{verbatim} +It is easier to test on Caml Light integers: +\begin{verbatim} +# let rec nat_of = function 0 -> O + | n -> S (nat_of (pred n));; +# let rec int_of = function O -> 0 + | S p -> succ (int_of p);; +# let div a b = match eucl_dev (nat_of b) (nat_of a) with + divex(q,r) -> (int_of q, int_of r);; +div : int -> int -> int * int = <fun> +# div 173 15;; +- : int * int = 11, 8 +\end{verbatim} + +\asubsection{Heapsort} + +Let us see a more complicated example. The file {\tt Heap\_prog.v} +contains the proof of an efficient list sorting algorithm described by +Bjerner. Is is an adaptation of the well-known {\em heapsort} +algorithm to functional languages. We first load the files: + +\begin{coq_eval} +Reset Initial. +\end{coq_eval} +\begin{coq_example*} +Require Extraction. +Require Heap_prog. +\end{coq_example*} + +As we saw it above we have to instantiate or realize by hand some of +the \Coq\ variables, which are in this case the type of the elements +to sort ({\tt List\_Dom}, defined in {\tt List.v}) and the +decidability of the order relation ({\tt inf\_total}). We proceed as +in section \ref{Extraction}: + +\begin{coq_example} +ML Import Constant int == int : Set. +Link List_Dom := int. +ML Import Inductive bool [ true false ] == + Inductive BOOL : Set := TRUE : BOOL + | FALSE : BOOL. +ML Import Constant lt_int == lt_int : int->int->BOOL. +Link inf_total := + [x,y:int]Cases (lt_int x y) of + TRUE => left + | FALSE => right + end. +\end{coq_example} + +Then we extract the Caml Light program + +\begin{coq_example} +Write CamlLight File "heapsort" [ heapsort ]. +\end{coq_example} +and test it +\begin{verbatim} + +# include "heapsort";; +# let rec listn = function 0 -> nil + | n -> cons(random__int 10000,listn (pred n));; +# heapsort (listn 10);; +- : list = cons (136, cons (760, cons (1512, cons (2776, cons (3064, +cons (4536, cons (5768, cons (7560, cons (8856, cons (8952, nil)))))))))) +\end{verbatim} + +Some tests on longer lists (100000 elements) show that the program is +quite efficient for Caml code. + +\asubsection{Balanced trees} + +The file {\tt Avl\_prog.v} contains the proof of insertion in binary +balanced trees (AVL). Here we choose to instantiate such trees on the +type {\tt string} of Caml Light (for instance to get efficient +dictionary); as above we must realize the decidability of the order +relation. It gives the following commands: + +\begin{coq_eval} +Reset Initial. +\end{coq_eval} +\begin{coq_example*} +Require Extraction. +Require Avl_prog. +\end{coq_example*} +\begin{coq_eval} +Pwd. +\end{coq_eval} +\begin{coq_example} +ML Import Constant string == string : Set. +ML Import Inductive bool [ true false ] == + Inductive BOOL : Set := TRUE : BOOL + | FALSE : BOOL. +ML Import Constant lt_string == lt_string : string->string->BOOL. +Link a := string. +Link inf_dec := + [x,y:string]Cases (lt_string x y) of + TRUE => left + | FALSE => right + end. +Write CamlLight File "avl" [rot_d rot_g rot_gd insert]. +\end{coq_example} + +Notice that we do not want the constants {\tt rot\_d}, {\tt rot\_g} +and {\tt rot\_gd} to be expanded in the function {\tt insert}, and +that is why we added them in the list of required functions. It makes +the resulting program clearer, even if it becomes less efficient. + +Let us insert random words in an initially empty tree to check that it +remains balanced: +\begin{verbatim} +% camllight +# include "avl";; +# let add a t = match insert a t with + h_eq x -> x + | h_plus x -> x ;; +# let rdmw () = let s = create_string 5 in + for i = 0 to 4 do + set_nth_char s i (char_of_int (97+random__int 26)) + done ; s ;; +# let rec built = function 0 -> nil + | n -> add (rdmw()) (built (pred n));; +# built 10;; +- : abe = node ("ogccy", node ("gmygy", node ("cwqug", node ("cjyrc", nil, ... + + +# let rec size = function + nil -> 0 + | node(_,t1,t2,_) -> 1+(max (size t1) (size t2)) ;; +# let rec check = function + nil -> true + | node(_,a1,a2,_) -> + let t1 = size a1 and t2 =size a2 in + if abs(t1-t2)>1 then false else (check a1) & (check a2) ;; + +# check (built 100);; +- : bool = true +\end{verbatim} + + +\section{Bugs} + +Surely there are still bugs in the {\tt Extraction} module. +You can send your bug reports directly to the author +(at \textsf{Jean-Christophe.Filliatre$@$lri.fr}) or to the \Coq\ +mailing list (at \textsf{coq$@$pauillac.inria.fr}). + +% $Id$ |