diff options
Diffstat (limited to 'doc/refman/Extraction.tex')
-rw-r--r-- | doc/refman/Extraction.tex | 551 |
1 files changed, 551 insertions, 0 deletions
diff --git a/doc/refman/Extraction.tex b/doc/refman/Extraction.tex new file mode 100644 index 00000000..af5d4049 --- /dev/null +++ b/doc/refman/Extraction.tex @@ -0,0 +1,551 @@ +\achapter{Extraction of programs in Objective Caml and Haskell} +\label{Extraction} +\aauthor{Jean-Christophe Filliātre and Pierre Letouzey} +\index{Extraction} + +We present here the \Coq\ extraction commands, used to build certified +and relatively efficient functional programs, extracting them from +either \Coq\ functions or \Coq\ proofs of specifications. The +functional languages available as output are currently \ocaml{}, +\textsc{Haskell} and \textsc{Scheme}. In the following, ``ML'' will +be used (abusively) to refer to any of the three. + +\paragraph{Differences with old versions.} +The current extraction mechanism is new for version 7.0 of {\Coq}. +In particular, the \FW\ toplevel used as an intermediate step between +\Coq\ and ML has been withdrawn. It is also not possible +any more to import ML objects in this \FW\ toplevel. +The current mechanism also differs from +the one in previous versions of \Coq: there is no more +an explicit toplevel for the language (formerly called \textsc{Fml}). + +\asection{Generating ML code} +\comindex{Extraction} +\comindex{Recursive Extraction} +\comindex{Extraction Module} +\comindex{Recursive Extraction Module} + +The next two commands are meant to be used for rapid preview of +extraction. They both display extracted term(s) inside \Coq. + +\begin{description} +\item {\tt Extraction \qualid.} ~\par + Extracts one constant or module in the \Coq\ toplevel. + +\item {\tt Recursive Extraction \qualid$_1$ \dots\ \qualid$_n$.} ~\par + Recursive extraction of all the globals (or modules) \qualid$_1$ \dots\ + \qualid$_n$ and all their dependencies in the \Coq\ toplevel. +\end{description} + +%% TODO error messages + +All the following commands produce real ML files. User can choose to produce +one monolithic file or one file per \Coq\ library. + +\begin{description} +\item {\tt Extraction "{\em file}"} + \qualid$_1$ \dots\ \qualid$_n$. ~\par + Recursive extraction of all the globals (or modules) \qualid$_1$ \dots\ + \qualid$_n$ and all their dependencies in one monolithic file {\em file}. + Global and local identifiers are renamed according to the chosen ML + language to fulfill its syntactic conventions, keeping original + names as much as possible. + +\item {\tt Extraction Library} \ident. ~\par + Extraction of the whole \Coq\ library {\tt\ident.v} to an ML module + {\tt\ident.ml}. In case of name clash, identifiers are here renamed + using prefixes \verb!coq_! or \verb!Coq_! to ensure a + session-independent renaming. + +\item {\tt Recursive Extraction Library} \ident. ~\par + Extraction of the \Coq\ library {\tt\ident.v} and all other modules + {\tt\ident.v} depends on. +\end{description} + +The list of globals \qualid$_i$ does not need to be +exhaustive: it is automatically completed into a complete and minimal +environment. + +\asection{Extraction options} + +\asubsection{Setting the target language} +\comindex{Extraction Language} + +The ability to fix target language is the first and more important +of the extraction options. Default is Ocaml. +\begin{description} +\item {\tt Extraction Language Ocaml}. +\item {\tt Extraction Language Haskell}. +\item {\tt Extraction Language Scheme}. +\end{description} + +\asubsection{Inlining and optimizations} + +Since Objective Caml is a strict language, the extracted +code has to be optimized in order to be efficient (for instance, when +using induction principles we do not want to compute all the recursive +calls but only the needed ones). So the extraction mechanism provides +an automatic optimization routine that will be +called each time the user want to generate Ocaml programs. Essentially, +it performs constants inlining and reductions. Therefore some +constants may not appear in resulting monolithic Ocaml program. +In the case of modular extraction, even if some inlining is done, the +inlined constant are nevertheless printed, to ensure +session-independent programs. + +Concerning Haskell, such optimizations are less useful because of +lazyness. We still make some optimizations, for example in order to +produce more readable code. + +All these optimizations are controled by the following \Coq\ options: + +\begin{description} + +\item \comindex{Set Extraction Optimize} +{\tt Set Extraction Optimize.} + +\item \comindex{Unset Extraction Optimize} +{\tt Unset Extraction Optimize.} + +Default is Set. This control all optimizations made on the ML terms +(mostly reduction of dummy beta/iota redexes, but also simplifications on +Cases, etc). Put this option to Unset if you want a ML term as close as +possible to the Coq term. + +\item \comindex{Set Extraction AutoInline} +{\tt Set Extraction AutoInline.} + +\item \comindex{Unset Extraction AutoInline} +{\tt Unset Extraction AutoInline.} + +Default is Set, so by default, the extraction mechanism feels free to +inline the bodies of some defined constants, according to some heuristics +like size of bodies, useness of some arguments, etc. Those heuristics are +not always perfect, you may want to disable this feature, do it by Unset. + +\item \comindex{Extraction Inline} +{\tt Extraction Inline} \qualid$_1$ \dots\ \qualid$_n$. + +\item \comindex{Extraction NoInline} +{\tt Extraction NoInline} \qualid$_1$ \dots\ \qualid$_n$. + +In addition to the automatic inline feature, you can now tell precisely to +inline some more constants by the {\tt Extraction Inline} command. Conversely, +you can forbid the automatic inlining of some specific constants by +the {\tt Extraction NoInline} command. +Those two commands enable a precise control of what is inlined and what is not. + +\item \comindex{Print Extraction Inline} +{\tt Print Extraction Inline}. + +Prints the current state of the table recording the custom inlinings +declared by the two previous commands. + +\item \comindex{Reset Extraction Inline} +{\tt Reset Extraction Inline}. + +Puts the table recording the custom inlinings back to empty. + +\end{description} + + +\paragraph{Inlining and printing of a constant declaration.} + +A user can explicitly ask for a constant to be extracted by two means: +\begin{itemize} +\item by mentioning it on the extraction command line +\item by extracting the whole \Coq\ module of this constant. +\end{itemize} +In both cases, the declaration of this constant will be present in the +produced file. +But this same constant may or may not be inlined in the following +terms, depending on the automatic/custom inlining mechanism. + + +For the constants non-explicitly required but needed for dependency +reasons, there are two cases: +\begin{itemize} +\item If an inlining decision is taken, whether automatically or not, +all occurrences of this constant are replaced by its extracted body, and +this constant is not declared in the generated file. +\item If no inlining decision is taken, the constant is normally + declared in the produced file. +\end{itemize} + +\asubsection{Extra elimination of useless arguments} + +\begin{description} +\item \comindex{Extraction Implicit} + {\tt Extraction Implicit} \qualid\ [ \ident$_1$ \dots\ \ident$_n$ ]. + +This experimental command allows to declare some arguments of +\qualid\ as implicit, i.e. useless in extracted code and hence to +be removed by extraction. Here \qualid\ can be any function or +inductive constructor, and \ident$_i$ are the names of the concerned +arguments. In fact, an argument can also be referred by a number +indicating its position, starting from 1. When an actual extraction +takes place, an error is raised if the {\tt Extraction Implicit} +declarations cannot be honored, that is if any of the implicited +variables still occurs in the final code. This declaration of useless +arguments is independent but complementary to the main elimination +principles of extraction (logical parts and types). +\end{description} + +\asubsection{Realizing axioms}\label{extraction:axioms} + +Extraction will fail if it encounters an informative +axiom not realized (see Section~\ref{extraction:axioms}). +A warning will be issued if it encounters an logical axiom, to remind +user that inconsistent logical axioms may lead to incorrect or +non-terminating extracted terms. + +It is possible to assume some axioms while developing a proof. Since +these axioms can be any kind of proposition or object or type, they may +perfectly well have some computational content. But a program must be +a closed term, and of course the system cannot guess the program which +realizes an axiom. Therefore, it is possible to tell the system +what ML term corresponds to a given axiom. + +\comindex{Extract Constant} +\begin{description} +\item{\tt Extract Constant \qualid\ => \str.} ~\par + Give an ML extraction for the given constant. + The \str\ may be an identifier or a quoted string. +\item{\tt Extract Inlined Constant \qualid\ => \str.} ~\par + Same as the previous one, except that the given ML terms will + be inlined everywhere instead of being declared via a let. +\end{description} + +Note that the {\tt Extract Inlined Constant} command is sugar +for an {\tt Extract Constant} followed by a {\tt Extraction Inline}. +Hence a {\tt Reset Extraction Inline} will have an effect on the +realized and inlined axiom. + +Of course, it is the responsibility of the user to ensure that the ML +terms given to realize the axioms do have the expected types. In +fact, the strings containing realizing code are just copied in the +extracted files. The extraction recognizes whether the realized axiom +should become a ML type constant or a ML object declaration. + +\Example +\begin{coq_example} +Axiom X:Set. +Axiom x:X. +Extract Constant X => "int". +Extract Constant x => "0". +\end{coq_example} + +Notice that in the case of type scheme axiom (i.e. whose type is an +arity, that is a sequence of product finished by a sort), then some type +variables has to be given. The syntax is then: + +\begin{description} +\item{\tt Extract Constant \qualid\ \str$_1$ \ldots \str$_n$ => \str.} ~\par +\end{description} + +The number of type variables is checked by the system. + +\Example +\begin{coq_example} +Axiom Y : Set -> Set -> Set. +Extract Constant Y "'a" "'b" => " 'a*'b ". +\end{coq_example} + +Realizing an axiom via {\tt Extract Constant} is only useful in the +case of an informative axiom (of sort Type or Set). A logical axiom +have no computational content and hence will not appears in extracted +terms. But a warning is nonetheless issued if extraction encounters a +logical axiom. This warning reminds user that inconsistent logical +axioms may lead to incorrect or non-terminating extracted terms. + +If an informative axiom has not been realized before an extraction, a +warning is also issued and the definition of the axiom is filled with +an exception labeled {\tt AXIOM TO BE REALIZED}. The user must then +search these exceptions inside the extracted file and replace them by +real code. + +\comindex{Extract Inductive} + +The system also provides a mechanism to specify ML terms for inductive +types and constructors. For instance, the user may want to use the ML +native boolean type instead of \Coq\ one. The syntax is the following: + +\begin{description} +\item{\tt Extract Inductive \qualid\ => \str\ [ \str\ \dots \str\ ]\ +{\it optstring}.} ~\par + Give an ML extraction for the given inductive type. You must specify + extractions for the type itself (first \str) and all its + constructors (between square brackets). If given, the final optional + string should contain a function emulating pattern-matching over this + inductive type. If this optional string is not given, the ML + extraction must be an ML inductive datatype, and the native + pattern-matching of the language will be used. +\end{description} + +For an inductive type with $k$ constructor, the function used to +emulate the match should expect $(k+1)$ arguments, first the $k$ +branches in functional form, and then the inductive element to +destruct. For instance, the match branch \verb$| S n => foo$ gives the +functional form \verb$(fun n -> foo)$. Note that a constructor with no +argument is considered to have one unit argument, in order to block +early evaluation of the branch: \verb$| O => bar$ leads to the functional +form \verb$(fun () -> bar)$. For instance, when extracting {\tt nat} +into {\tt int}, the code to provide has type: +{\tt (unit->'a)->(int->'a)->int->'a}. + +As for {\tt Extract Inductive}, this command should be used with care: +\begin{itemize} +\item The ML code provided by the user is currently \emph{not} checked at all by + extraction, even for syntax errors. + +\item Extracting an inductive type to a pre-existing ML inductive type +is quite sound. But extracting to a general type (by providing an +ad-hoc pattern-matching) will often \emph{not} be fully rigorously +correct. For instance, when extracting {\tt nat} to Ocaml's {\tt +int}, it is theoretically possible to build {\tt nat} values that are +larger than Ocaml's {\tt max\_int}. It is the user's responsability to +be sure that no overflow or other bad events occur in practice. + +\item Translating an inductive type to an ML type does \emph{not} +magically improve the asymptotic complexity of functions, even if the +ML type is an efficient representation. For instance, when extracting +{\tt nat} to Ocaml's {\tt int}, the function {\tt mult} stays +quadratic. It might be interesting to associate this translation with +some specific {\tt Extract Constant} when primitive counterparts exist. +\end{itemize} + +\Example +Typical examples are the following: +\begin{coq_example} +Extract Inductive unit => "unit" [ "()" ]. +Extract Inductive bool => "bool" [ "true" "false" ]. +Extract Inductive sumbool => "bool" [ "true" "false" ]. +\end{coq_example} + +If an inductive constructor or type has arity 2 and the corresponding +string is enclosed by parenthesis, then the rest of the string is used +as infix constructor or type. +\begin{coq_example} +Extract Inductive list => "list" [ "[]" "(::)" ]. +Extract Inductive prod => "(*)" [ "(,)" ]. +\end{coq_example} + +As an example of translation to a non-inductive datatype, let's turn +{\tt nat} into Ocaml's {\tt int} (see caveat above): +\begin{coq_example} +Extract Inductive nat => int [ "0" "succ" ] + "(fun fO fS n => if n=0 then fO () else fS (n-1))". +\end{coq_example} + +\asubsection{Avoiding conflicts with existing filenames} + +\comindex{Extraction Blacklist} + +When using {\tt Extraction Library}, the names of the extracted files +directly depends from the names of the \Coq\ files. It may happen that +these filenames are in conflict with already existing files, +either in the standard library of the target language or in other +code that is meant to be linked with the extracted code. +For instance the module {\tt List} exists both in \Coq\ and in Ocaml. +It is possible to instruct the extraction not to use particular filenames. + +\begin{description} +\item{\tt Extraction Blacklist \ident \ldots \ident.} ~\par + Instruct the extraction to avoid using these names as filenames + for extracted code. +\item{\tt Print Extraction Blacklist.} ~\par + Show the current list of filenames the extraction should avoid. +\item{\tt Reset Extraction Blacklist.} ~\par + Allow the extraction to use any filename. +\end{description} + +For Ocaml, a typical use of these commands is +{\tt Extraction Blacklist String List}. + +\asection{Differences between \Coq\ and ML type systems} + + +Due to differences between \Coq\ and ML type systems, +some extracted programs are not directly typable in ML. +We now solve this problem (at least in Ocaml) by adding +when needed some unsafe casting {\tt Obj.magic}, which give +a generic type {\tt 'a} to any term. + +For example, here are two kinds of problem that can occur: + +\begin{itemize} + \item If some part of the program is {\em very} polymorphic, there + may be no ML type for it. In that case the extraction to ML works + all right but the generated code may be refused by the ML + type-checker. A very well known example is the {\em distr-pair} + function: +\begin{verbatim} +Definition dp := + fun (A B:Set)(x:A)(y:B)(f:forall C:Set, C->C) => (f A x, f B y). +\end{verbatim} + +In Ocaml, for instance, the direct extracted term would be: + +\begin{verbatim} +let dp x y f = Pair((f () x),(f () y)) +\end{verbatim} + +and would have type: +\begin{verbatim} +dp : 'a -> 'a -> (unit -> 'a -> 'b) -> ('b,'b) prod +\end{verbatim} + +which is not its original type, but a restriction. + +We now produce the following correct version: +\begin{verbatim} +let dp x y f = Pair ((Obj.magic f () x), (Obj.magic f () y)) +\end{verbatim} + + \item Some definitions of \Coq\ may have no counterpart in ML. This + happens when there is a quantification over types inside the type + of a constructor; for example: +\begin{verbatim} +Inductive anything : Set := dummy : forall A:Set, A -> anything. +\end{verbatim} + +which corresponds to the definition of an ML dynamic type. +In Ocaml, we must cast any argument of the constructor dummy. + +\end{itemize} + +Even with those unsafe castings, you should never get error like +``segmentation fault''. In fact even if your program may seem +ill-typed to the Ocaml type-checker, it can't go wrong: it comes +from a Coq well-typed terms, so for example inductives will always +have the correct number of arguments, etc. + +More details about the correctness of the extracted programs can be +found in \cite{Let02}. + +We have to say, though, that in most ``realistic'' programs, these +problems do not occur. For example all the programs of Coq library are +accepted by Caml type-checker without any {\tt Obj.magic} (see examples below). + + + +\asection{Some examples} + +We present here two examples of extractions, taken from the +\Coq\ Standard Library. We choose \ocaml\ as target language, +but all can be done in the other dialects with slight modifications. +We then indicate where to find other examples and tests of Extraction. + +\asubsection{A detailed example: Euclidean division} + +The file {\tt Euclid} contains the proof of Euclidean division +(theorem {\tt eucl\_dev}). The natural numbers defined in the example +files are unary integers defined by two constructors $O$ and $S$: +\begin{coq_example*} +Inductive nat : Set := + | O : nat + | S : nat -> nat. +\end{coq_example*} + +This module contains a theorem {\tt eucl\_dev}, whose type is: +\begin{verbatim} +forall b:nat, b > 0 -> forall a:nat, diveucl a b +\end{verbatim} +where {\tt diveucl} is a type for the pair of the quotient and the +modulo, plus some logical assertions that disappear during extraction. +We can now extract this program to \ocaml: + +\begin{coq_eval} +Reset Initial. +\end{coq_eval} +\begin{coq_example} +Require Import Euclid Wf_nat. +Extraction Inline gt_wf_rec lt_wf_rec induction_ltof2. +Recursive Extraction eucl_dev. +\end{coq_example} + +The inlining of {\tt gt\_wf\_rec} and others is not +mandatory. It only enhances readability of extracted code. +You can then copy-paste the output to a file {\tt euclid.ml} or let +\Coq\ do it for you with the following command: + +\begin{coq_example} +Extraction "euclid" eucl_dev. +\end{coq_example} + +Let us play the resulting program: + +\begin{verbatim} +# #use "euclid.ml";; +type nat = O | S of nat +type sumbool = Left | Right +val minus : nat -> nat -> nat = <fun> +val le_lt_dec : nat -> nat -> sumbool = <fun> +val le_gt_dec : nat -> nat -> sumbool = <fun> +type diveucl = Divex of nat * nat +val eucl_dev : nat -> nat -> diveucl = <fun> +# eucl_dev (S (S O)) (S (S (S (S (S O)))));; +- : diveucl = Divex (S (S O), S O) +\end{verbatim} +It is easier to test on \ocaml\ integers: +\begin{verbatim} +# let rec nat_of_int = function 0 -> O | n -> S (nat_of_int (n-1));; +val i2n : int -> nat = <fun> +# let rec int_of_nat = function O -> 0 | S p -> 1+(int_of_nat p);; +val n2i : nat -> int = <fun> +# let div a b = + let Divex (q,r) = eucl_dev (nat_of_int b) (nat_of_int a) + in (int_of_nat q, int_of_nat r);; +val div : int -> int -> int * int = <fun> +# div 173 15;; +- : int * int = (11, 8) +\end{verbatim} + +Note that these {\tt nat\_of\_int} and {\tt int\_of\_nat} are now +available via a mere {\tt Require Import ExtrOcamlIntConv} and then +adding these functions to the list of functions to extract. This file +{\tt ExtrOcamlIntConv.v} and some others in {\tt plugins/extraction/} +are meant to help building concrete program via extraction. + +\asubsection{Extraction's horror museum} + +Some pathological examples of extraction are grouped in the file +{\tt test-suite/success/extraction.v} of the sources of \Coq. + +\asubsection{Users' Contributions} + + Several of the \Coq\ Users' Contributions use extraction to produce + certified programs. In particular the following ones have an automatic + extraction test (just run {\tt make} in those directories): + + \begin{itemize} + \item Bordeaux/Additions + \item Bordeaux/EXCEPTIONS + \item Bordeaux/SearchTrees + \item Dyade/BDDS + \item Lannion + \item Lyon/CIRCUITS + \item Lyon/FIRING-SQUAD + \item Marseille/CIRCUITS + \item Muenchen/Higman + \item Nancy/FOUnify + \item Rocq/ARITH/Chinese + \item Rocq/COC + \item Rocq/GRAPHS + \item Rocq/HIGMAN + \item Sophia-Antipolis/Stalmarck + \item Suresnes/BDD + \end{itemize} + + Lannion, Rocq/HIGMAN and Lyon/CIRCUITS are a bit particular. They are + examples of developments where {\tt Obj.magic} are needed. + This is probably due to an heavy use of impredicativity. + After compilation those two examples run nonetheless, + thanks to the correction of the extraction~\cite{Let02}. + +% $Id: Extraction.tex 13153 2010-06-15 16:09:43Z letouzey $ + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "Reference-Manual" +%%% End: |