Initial revision

git-svn-id: svn+ssh://scm.gforge.inria.fr/svn/coq/trunk@8143 85f007b7-540e-0410-9357-904b9bb8a0f7
author: filliatr <filliatr@85f007b7-540e-0410-9357-904b9bb8a0f7> 2000-12-12 22:36:15 +0000
committer: filliatr <filliatr@85f007b7-540e-0410-9357-904b9bb8a0f7> 2000-12-12 22:36:15 +0000
commit: b6018c78b25da14d4f44cf10de692f968cba1e98 (patch)
tree: c152b9761b70cbc554efdc2f05f3a995444ed259 /doc/Extraction.tex
parent: 88e2679b89a32189673b808acfe3d6181a38c000 (diff)
1 files changed, 557 insertions, 0 deletions
diff --git a/doc/Extraction.tex b/doc/Extraction.tex
new file mode 100755
index 000000000..cb9949d97
--- /dev/null
+++ b/doc/Extraction.tex
@@ -0,0 +1,557 @@
+\achapter{Execution of extracted programs in Caml and Haskell}
+\label{CamlHaskellExtraction}
+\aauthor{Benjamin Werner and Jean-Christophe Filliātre}
+\index{Extraction}
+
+It is possible to use \Coq\ to build certified and relatively
+efficient programs, extracting them from the proofs of their
+specifications. The extracted objects are terms of \FW, and can be 
+obtained at the \Coq\ toplevel with the command {\tt Extraction}
+(see \ref{Extraction}).
+
+We present here a \Coq\ module, {\tt Extraction}, which translates the
+extracted terms to ML dialects, namely Caml Light, Objective Caml and
+Haskell. In the following, we will not refer to a particular dialect
+when possible and ``ML'' will be used to refer to any of the target
+dialects.
+
+One builds effective programs in an \FW\ toplevel (actually the \Coq\
+toplevel) which contains the extracted objects and in which one can
+import ML objects. Indeed, in order to instantiate and realize \Coq\
+type and term variables, it is possible to import ML
+objects in the \FW\ toplevel, as inductive types or axioms.
+
+\Rem
+The current mechanism of extraction of effective programs
+from \Coq\ proofs slightly differs from the one in the versions of
+\Coq\ anterior to the version V5.8. In these versions, there were an
+explicit toplevel for the language {\sf Fml}. Moreover, it was not
+possible to import ML objects in this {\sf Fml} toplevel.
+
+
+%\section{Extraction facilities}
+%
+%(* TO DO *)
+%
+
+\medskip
+In the first part of this document we describe the commands of the
+{\tt Extraction} module, and we give some examples in the second part.
+
+
+\asection{The {\tt Extraction} module}
+\label{Extraction}
+
+This section explains how to import ML objects, to realize axioms and
+finally to generate ML code from the extracted programs of \FW.
+
+These features do not belong to the core system, and appear as an
+independent module called {\tt Extraction.v} (which is compiled during the
+installation of the system). So the first thing to do is to load this
+module:
+
+\begin{coq_example*}
+Require Extraction.
+\end{coq_example*}
+
+\asubsection{Generating executable ML code}
+\comindex{Write Caml File}
+\comindex{Write CamlLight File}
+\comindex{Write Haskell File}
+The \Coq\ commands to generate ML code are:
+\begin{center}\begin{tabular}{ll}
+  {\tt Write Caml File "\str" [ \ident$_1$ \dots\ \ident$_n$ ] 
+       {\it options}.}
+    & ({\em for Objective Caml\/}) \\
+  {\tt Write CamlLight File "\str" [ \ident$_1$ \dots\ \ident$_n$ ]
+       {\it options}.} & \\
+  {\tt Write Haskell File "\str" [ \ident$_1$ \dots\ \ident$_n$ ]
+       {\it options}.} & \\
+\end{tabular}\end{center}
+where \str\ is the name given to the file to be produced (the suffix
+{\tt .ml} is added if necessary), and \ident$_1$ \dots\ \ident$_n$ the
+names of the constants to be extracted. This list does not need to be
+exhaustive: it is automatically completed into a complete and minimal
+environment.  Remaining axioms are translated into exceptions, and a
+warning is printed in that case. In particular, this will be the case
+for {\tt False\_rec}. (We will see below how to realize axioms).
+
+\paragraph{Optimizations.} 
+Since Caml Light and Objective Caml are strict languages, the extracted
+code has to be optimized in order to be efficient (for instance, when
+using induction principles we do not want to compute all the recursive
+calls but only the needed ones). So an optimization routine will be
+called each time the user want to generate Caml programs. Essentially,
+it performs constants expansions and reductions.  Therefore some
+constants will not appear in the resulting Caml program (A warning is
+printed for each such constant). To avoid this, just put the constant
+name in the previous list \ident$_1$ \dots\ \ident$_n$ and it will not
+be expanded. Moreover, three options allow the user to control the
+expansion strategy :
+\begin{description}
+  \item[\texttt{noopt}] : specifies not to do any optimization.
+  \item[\texttt{exact}] : specifies to extract exactly the given
+    objects (no recursivity).
+  \item[\texttt{expand [ \ident$_1$ \dots\ \ident$_n$ ]}] : 
+    forces the expansion of the constants \ident$_1$ \dots\ \ident$_n$
+    (when it is possible).
+\end{description}
+
+
+\asubsection{Realizing axioms}
+\comindex{Link}
+
+It is possible to assume some axioms while developing a proof. Since
+these axioms can be any kind of proposition or object type, they may
+perfectly well have some computational content. But a program must be
+a closed term, and of course the system cannot guess the program which
+realizes an axiom.  Therefore, it is possible to tell the system
+what program (an \FW\ term actually) corresponds to a given \Coq\
+axiom. The command is {\tt Link} and the syntax:
+$$\mbox{\tt Link \ident\ := Fwterm.}$$
+where \ident\ is the name of the axiom to realize and Fwterm\ the
+term which realizes it. The system checks that this term has the same
+type as the axiom \ident, and returns an error if not.  This command
+attaches a body to an axiom, and can be seen as a transformation of an
+axiom into a constant.
+
+These semantical attachments have to be done {\em before} generating
+the ML code. All type variables must be realized, and term variables
+which are not realized will be translated into exceptions. 
+
+\Example Let us illustrate this feature on a small
+example. Assume that you have a type variable {\tt A} of type \Set:
+
+\begin{coq_example*}
+Parameter A : Set.
+\end{coq_example*}
+
+and that your specification proof assumes that there is an order
+relation {\em inf} over that type (which has no computational
+content), and that this relation is total and decidable:
+
+\begin{coq_example*}
+Parameter inf : A -> A -> Prop.
+Axiom inf_total : (x,y:A) {(inf x y)}+{(inf y x)}.
+\end{coq_example*}
+
+Now suppose that we want to use this specification proof on natural
+numbers; this means {\tt A} has to be instantiated by {\tt nat}
+and the axiom {\tt inf\_total} will be realized, for instance, using the
+order relation {\tt le} on that type and the decidability lemma 
+{\tt le\_lt\_dec}. Here is how to proceed:
+
+\begin{coq_example*}
+Require Compare_dec.
+\end{coq_example*}
+\begin{coq_example}
+Link A := nat.
+Link inf_total := le_lt_dec.
+\end{coq_example}
+
+\Warning There is no rollback on the command {\tt Link}, that
+is the semantical attachments are not forgotten when doing a {\tt Reset},
+or a {\tt Restore State} command. This will be corrected in a later
+version.
+
+\asubsection{Importing ML objects}
+In order to realize axioms and to instantiate programs on real data
+types, like {\tt int}, {\tt string}, \dots\ or more complicated data
+structures, one want to import existing ML objects in the \FW\
+environment.  The system provides such features, through the commands
+{\tt ML Import Constant} and {\tt ML Import Inductive}.
+The first one imports an ML
+object as a new axiom and the second one adds a new inductive
+definition corresponding to an ML inductive type.  
+
+\paragraph{Warning.}
+In the case of Caml dialects, the system would be able to check the
+correctness of the imported objects by looking into the interfaces
+files of Caml modules ({\tt .mli} files), but this feature is not yet
+implemented. So one must be careful when declaring the types of the
+imported objects.
+
+\paragraph{Caml names.}
+When referencing a Caml object, you can use strings instead of
+identifiers. Therefore you can use the double
+underscore notation \verb!module__name! (Caml Light objects)
+or the dot notation \verb!module.name! (Objective Caml objects)
+to precise the module in which lies the object.
+
+
+\asubsection{Importing inductive types}
+\comindex{ML Import Inductive}
+
+The \Coq\ command to import an ML inductive type is:
+$$\mbox{\tt ML Import Inductive \ident\ [\ident$_1$ \dots\ \ident$_n$] == %
+{\em <Inductive Definition>}.}$$
+where \ident\ is the name of the ML type, \ident$_1$ \dots\
+\ident$_n$ the name of its constructors, and {\tt\em<Inductive
+  Definition>} the corresponding \Coq\ inductive definition
+(see \ref{Inductive} in the Reference Manual for the syntax of
+inductive definitions). 
+
+This command inserts the {\tt\em<Inductive Definition>} in the \FW\
+environment, without elimination principles. From that moment, it is
+possible to use that type like any other \FW\ object, and in
+particular to use it to realize axioms. The names \ident\ \ident$_1$
+\dots\ \ident$_n$ may be different from the names given in the
+inductive definition, in order to avoid clash with previous
+constants, and are restored when generating the ML code. 
+
+\noindent One can also import mutual inductive types with the command:
+$$\begin{array}{rl}
+  \mbox{\tt ML Import Inductive} &
+              \mbox{\tt\ident$_1$ [\ident$^1_1$ \dots\ \ident$^1_{n_1}$]} \\
+    & \dots \\
+    & \mbox{\tt\ident$_k$ [\ident$^k_1$ \dots\ \ident$^k_{n_k}$]} \\
+    & \qquad \mbox{\tt== {\em<Mutual Inductive Definition>}.}
+  \end{array}$$ %$$
+
+\begin{Examples}
+\item Let us show for instance how to import the
+  type {\tt bool} of Caml Light booleans:
+
+\begin{coq_example}
+ML Import Inductive bool [ true false ] ==
+         Inductive BOOL : Set := TRUE  : BOOL
+                               | FALSE : BOOL.
+\end{coq_example}
+
+Here we changed the names because the type {\tt bool} is already
+defined in the initial state of \Coq.
+
+  \item Assuming that one defined the mutual inductive types {\tt
+tree} and {\tt forest} in a Caml Light module, one can import them
+with the command:
+
+\begin{coq_example}
+ML Import Inductive tree [node] forest [empty cons] ==
+    Mutual [A:Set] Inductive 
+       tree : Set := node : A -> (forest A) -> (tree A)
+    with
+       forest : Set := empty : (forest A) 
+                    | cons : (tree A) -> (forest A) -> (forest A).
+\end{coq_example}
+
+  \item One can import the polymorphic type of Caml Light lists with
+the command:
+\begin{coq_example}
+ML Import Inductive list [nil cons] == 
+    Inductive list [A:Set] : Set := nil : (list A)
+                                 | cons : A->(list A)->(list A).
+\end{coq_example}
+
+\Rem One would have to re-define {\tt nil} and {\tt cons} at
+the top of its program because these constructors have no name in Caml Light.
+\end{Examples}
+
+\asubsection{Importing terms and abstract types}
+\comindex{ML Import Constant}
+
+The other command to import an ML object is:
+$$\mbox{\tt ML Import Constant \ident$_{ML}$\ == \ident\ : Fwterm.}$$
+where \ident$_{ML}$\ is the name of the ML object and Fwterm\ its type in
+\FW. This command defines an axiom in \FW\ of name \ident\ and type
+Fwterm.
+
+\Example To import the type {\tt int} of Caml Light
+integers, and the $<$ binary relation on this type, just do
+\begin{coq_example}
+ML Import Constant int == int : Set.
+ML Import Constant lt_int == lt_int : int -> int -> BOOL.
+\end{coq_example}
+assuming that the Caml Light type {\tt bool} is already imported (with the
+name {\tt BOOL}, as above).
+
+
+\asubsection{Direct use of ML\ objects}
+\comindex{Extract Constant}
+\comindex{Extract Inductive}
+
+Sometimes the user do not want to extract \Coq\ objects to new ML code
+but wants to use already existing ML objects.  For instance, it is the
+case for the booleans, which already exist in ML: the user do not want
+to extract the \Coq\ inductive type \texttt{bool} to a new type for
+booleans, but wants to use the primitive boolean of ML.
+
+The command \texttt{Extract} fulfills this requirement.
+It allows the user to declare constant and inductive types which will not be
+extracted but replaced by ML objects. The syntax is the following
+$$
+\begin{tabular}{l}
+  \mbox{\tt Extract Constant \ident\ => \ident'.} \\
+  \mbox{\tt Extract Inductive \ident\ 
+            => \ident' [ \ident'$_1$ \dots \ident'$_n$ ].} 
+\end{tabular}
+$$ %$$
+where \ident\/ is the name of the \Coq\ object and the prime identifiers 
+the name of the corresponding ML objects (the names between brackets
+are the names of the constructors).
+Mutually recursive types are declared one by one, in any order.
+
+\Example
+Typical examples are the following:
+\begin{coq_example}
+Extract Inductive unit => unit [ "()" ].
+Extract Inductive bool => bool [ true false ].
+Extract Inductive sumbool => bool [ true false ].
+\end{coq_example}
+
+
+\asubsection{Differences between \Coq\ and ML type systems}
+
+\subsubsection{ML types that are not \FW\ types}
+
+Some ML recursive types have no counterpart in the type system of
+\Coq, like types using the record construction, or non positive types
+like
+\begin{verbatim}
+# type T = C of T->T;;
+\end{verbatim}
+In that case, you cannot import those types as inductive types, and
+the only way to do is to import them as abstract types (with {\tt ML
+Import}) together with the corresponding building and de-structuring
+functions (still with {\tt ML Import Constant}).
+
+
+\subsubsection{Programs that are not ML-typable}
+
+On the contrary, some extracted programs in \FW\ are not typable in
+ML. There are in fact two cases which can be problematic:
+\begin{itemize}
+  \item If some part of the program is {\em very} polymorphic, there
+    may be no ML type for it. In that case the extraction to ML works
+    all right but the generated code may be refused by the ML
+    type-checker. A very well known example is the {\em distr-pair}
+    function:
+$$\mbox{\tt
+Definition dp := [A,B:Set][x:A][y:B][f:(C:Set)C->C](f A x,f B y).
+}$$
+In Caml Light, for instance, the extracted term is 
+\verb!let dp x y f = pair((f x),(f y))!  and has type
+$$\mbox{\tt
+dp : 'a -> 'a -> ('a -> 'b) -> ('b,'b) prod
+}$$
+which is not its original type, but a restriction.
+
+  \item Some definitions of \FW\ may have no counterpart in ML. This
+    happens when there is a quantification over types inside the type
+    of a constructor; for example:
+$$\mbox{\tt
+Inductive anything : Set := dummy : (A:Set)A->anything.
+}$$
+which corresponds to the definition of ML dynamics.
+\end{itemize}
+
+The first case is not too problematic: it is still possible to run the
+programs by switching off the type-checker during compilation. Unless
+you misused the semantical attachment facilities you should never get
+any message like ``segmentation fault'' for which the extracted code
+would be to blame. To switch off the Caml type-checker, use the
+function {\tt obj\_\_magic} which gives the type {\tt 'a} to any
+object; but this implies changing a little the extracted code by hand.
+
+The second case is fatal. If some inductive type cannot be translated
+to ML, one has to change the proof (or possibly to ``cheat'' by
+some low-level manipulations we would not describe here). 
+
+We have to say, though, that in most ``realistic'' programs, these
+problems do not occur. For example all the programs of the library are
+accepted by Caml type-checker except {\tt Higman.v}\footnote{Should
+  you obtain a not ML-typable program out of a self developed example,
+  we would be interested in seeing it; so please mail us the example at
+  {\em coq@pauillac.inria.fr}}.
+
+
+\asection{Some examples}
+
+We present here few examples of extractions, taken from the {\tt
+theories} library of \Coq\ (into the {\tt PROGRAMS} directory). We
+choose Caml Light as target language, but all can be done in the other
+dialects with slight modifications.
+
+
+\asubsection{Euclidean division}
+
+The file {\tt Euclid\_prog} contains the proof of Euclidean division
+(theorem {\tt eucl\_dev}). The natural numbers defined in the example
+files are unary integers defined by two constructors $O$ and $S$:
+\begin{coq_example*}
+Inductive nat : Set := O : nat | S : nat -> nat.
+\end{coq_example*}
+
+To use the proof, we begin by loading the module {\tt Extraction} and the
+file into the \Coq\ environment:
+
+\begin{coq_eval}
+Reset Initial.
+AddPath "../theories/DEMOS/PROGRAMS".
+\end{coq_eval}
+\begin{coq_example*}
+Require Extraction.
+Require Euclid_prog.
+\end{coq_example*}
+
+This module contains a theorem {\tt eucl\_dev}, and its extracted term
+is of type {\tt (b:nat)(a:nat) (di\-veucl a b)}, where {\tt diveucl} is a
+type for the pair of the quotient and the modulo.
+We can now extract this program to Caml Light:
+
+\begin{coq_example}
+Write CamlLight File "euclid" [ eucl_dev ].
+\end{coq_example}
+
+This produces a file {\tt euclid.ml} containing all the necessary
+definitions until {\tt let eucl\_dev = ..}. Let us play the resulting program:
+
+\begin{verbatim}
+# include "euclid";;
+# eucl_dev (S (S O)) (S (S (S (S (S O)))));;
+- : diveucl = divex (S (S O), S O)
+\end{verbatim}
+It is easier to test on Caml Light integers:
+\begin{verbatim}
+# let rec nat_of = function 0 -> O
+                          | n -> S (nat_of (pred n));;
+# let rec int_of = function O   -> 0
+                          | S p -> succ (int_of p);;
+# let div a b = match eucl_dev (nat_of b) (nat_of a) with
+                  divex(q,r) -> (int_of q, int_of r);;
+div : int -> int -> int * int = <fun>
+# div 173 15;;
+- : int * int = 11, 8
+\end{verbatim}
+
+\asubsection{Heapsort}
+
+Let us see a more complicated example. The file {\tt Heap\_prog.v}
+contains the proof of an efficient list sorting algorithm described by
+Bjerner. Is is an adaptation of the well-known {\em heapsort}
+algorithm to functional languages. We first load the files:
+
+\begin{coq_eval}
+Reset Initial.
+\end{coq_eval}
+\begin{coq_example*}
+Require Extraction.
+Require Heap_prog.
+\end{coq_example*}
+
+As we saw it above we have to instantiate or realize by hand some of
+the \Coq\ variables, which are in this case the type of the elements
+to sort ({\tt List\_Dom}, defined in {\tt List.v}) and the
+decidability of the order relation ({\tt inf\_total}). We proceed as
+in section \ref{Extraction}:
+
+\begin{coq_example}
+ML Import Constant int == int : Set.
+Link List_Dom := int.
+ML Import Inductive bool [ true false ] ==
+         Inductive BOOL : Set := TRUE  : BOOL
+                               | FALSE : BOOL.
+ML Import Constant lt_int == lt_int : int->int->BOOL.
+Link inf_total :=
+         [x,y:int]Cases (lt_int x y) of
+                            TRUE => left
+                          | FALSE => right
+                          end.
+\end{coq_example}
+
+Then we extract the Caml Light program
+
+\begin{coq_example}
+Write CamlLight File "heapsort" [ heapsort ].
+\end{coq_example}
+and test it 
+\begin{verbatim}
+
+# include "heapsort";;
+# let rec listn = function 0 -> nil
+                         | n -> cons(random__int 10000,listn (pred n));;
+# heapsort (listn 10);;
+- : list = cons (136, cons (760, cons (1512, cons (2776, cons (3064, 
+cons (4536, cons (5768, cons (7560, cons (8856, cons (8952, nil))))))))))
+\end{verbatim}
+
+Some tests on longer lists (100000 elements) show that the program is
+quite efficient for Caml code.
+
+\asubsection{Balanced trees}
+
+The file {\tt Avl\_prog.v} contains the proof of insertion in binary
+balanced trees (AVL). Here we choose to instantiate such trees on the
+type {\tt string} of Caml Light (for instance to get efficient
+dictionary); as above we must realize the decidability of the order
+relation. It gives the following commands:
+
+\begin{coq_eval}
+Reset Initial.
+\end{coq_eval}
+\begin{coq_example*}
+Require Extraction.
+Require Avl_prog.
+\end{coq_example*}
+\begin{coq_eval}
+Pwd.
+\end{coq_eval}
+\begin{coq_example}
+ML Import Constant string == string : Set.
+ML Import Inductive bool [ true false ] ==
+  Inductive BOOL : Set := TRUE  : BOOL
+                        | FALSE : BOOL.
+ML Import Constant lt_string == lt_string : string->string->BOOL.
+Link a := string.
+Link inf_dec :=
+  [x,y:string]Cases (lt_string x y) of
+                TRUE => left
+              | FALSE => right
+              end.
+Write CamlLight File "avl" [rot_d rot_g rot_gd insert].
+\end{coq_example}
+
+Notice that we do not want the constants {\tt rot\_d}, {\tt rot\_g}
+and {\tt rot\_gd} to be expanded in the function {\tt insert}, and
+that is why we added them in the list of required functions. It makes
+the resulting program clearer, even if it becomes less efficient.
+
+Let us insert random words in an initially empty tree to check that it
+remains balanced:
+\begin{verbatim}
+% camllight
+# include "avl";;
+# let add a t = match insert a t with
+                 h_eq x -> x
+               | h_plus x -> x ;;
+# let rdmw () = let s = create_string 5 in
+                for i = 0 to 4 do
+                  set_nth_char s i (char_of_int (97+random__int 26))
+                done ; s ;;
+# let rec built = function 0 -> nil
+                         | n -> add (rdmw()) (built (pred n));;
+# built 10;;
+- : abe = node ("ogccy", node ("gmygy", node ("cwqug", node ("cjyrc", nil, ...
+
+
+# let rec size = function
+    nil -> 0
+  | node(_,t1,t2,_) -> 1+(max (size t1) (size t2)) ;;
+# let rec check = function
+    nil -> true
+  | node(_,a1,a2,_) -> 
+         let t1 = size a1 and t2 =size a2 in
+         if abs(t1-t2)>1 then false else (check a1) & (check a2) ;;
+
+# check (built 100);;
+- : bool = true
+\end{verbatim}
+
+
+\section{Bugs}
+
+Surely there are still bugs in the {\tt Extraction} module.
+You can send your bug reports directly to the author
+(at \textsf{Jean-Christophe.Filliatre$@$lri.fr}) or to the \Coq\
+mailing list (at \textsf{coq$@$pauillac.inria.fr}).
+
+% $Id$
author	filliatr <filliatr@85f007b7-540e-0410-9357-904b9bb8a0f7>	2000-12-12 22:36:15 +0000
committer	filliatr <filliatr@85f007b7-540e-0410-9357-904b9bb8a0f7>	2000-12-12 22:36:15 +0000
commit	b6018c78b25da14d4f44cf10de692f968cba1e98 (patch)
tree	c152b9761b70cbc554efdc2f05f3a995444ed259 /doc/Extraction.tex
parent	88e2679b89a32189673b808acfe3d6181a38c000 (diff)