summaryrefslogtreecommitdiff
path: root/doc/refman/Extraction.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/refman/Extraction.tex')
-rw-r--r--doc/refman/Extraction.tex551
1 files changed, 551 insertions, 0 deletions
diff --git a/doc/refman/Extraction.tex b/doc/refman/Extraction.tex
new file mode 100644
index 00000000..af5d4049
--- /dev/null
+++ b/doc/refman/Extraction.tex
@@ -0,0 +1,551 @@
+\achapter{Extraction of programs in Objective Caml and Haskell}
+\label{Extraction}
+\aauthor{Jean-Christophe Filliātre and Pierre Letouzey}
+\index{Extraction}
+
+We present here the \Coq\ extraction commands, used to build certified
+and relatively efficient functional programs, extracting them from
+either \Coq\ functions or \Coq\ proofs of specifications. The
+functional languages available as output are currently \ocaml{},
+\textsc{Haskell} and \textsc{Scheme}. In the following, ``ML'' will
+be used (abusively) to refer to any of the three.
+
+\paragraph{Differences with old versions.}
+The current extraction mechanism is new for version 7.0 of {\Coq}.
+In particular, the \FW\ toplevel used as an intermediate step between
+\Coq\ and ML has been withdrawn. It is also not possible
+any more to import ML objects in this \FW\ toplevel.
+The current mechanism also differs from
+the one in previous versions of \Coq: there is no more
+an explicit toplevel for the language (formerly called \textsc{Fml}).
+
+\asection{Generating ML code}
+\comindex{Extraction}
+\comindex{Recursive Extraction}
+\comindex{Extraction Module}
+\comindex{Recursive Extraction Module}
+
+The next two commands are meant to be used for rapid preview of
+extraction. They both display extracted term(s) inside \Coq.
+
+\begin{description}
+\item {\tt Extraction \qualid.} ~\par
+ Extracts one constant or module in the \Coq\ toplevel.
+
+\item {\tt Recursive Extraction \qualid$_1$ \dots\ \qualid$_n$.} ~\par
+ Recursive extraction of all the globals (or modules) \qualid$_1$ \dots\
+ \qualid$_n$ and all their dependencies in the \Coq\ toplevel.
+\end{description}
+
+%% TODO error messages
+
+All the following commands produce real ML files. User can choose to produce
+one monolithic file or one file per \Coq\ library.
+
+\begin{description}
+\item {\tt Extraction "{\em file}"}
+ \qualid$_1$ \dots\ \qualid$_n$. ~\par
+ Recursive extraction of all the globals (or modules) \qualid$_1$ \dots\
+ \qualid$_n$ and all their dependencies in one monolithic file {\em file}.
+ Global and local identifiers are renamed according to the chosen ML
+ language to fulfill its syntactic conventions, keeping original
+ names as much as possible.
+
+\item {\tt Extraction Library} \ident. ~\par
+ Extraction of the whole \Coq\ library {\tt\ident.v} to an ML module
+ {\tt\ident.ml}. In case of name clash, identifiers are here renamed
+ using prefixes \verb!coq_! or \verb!Coq_! to ensure a
+ session-independent renaming.
+
+\item {\tt Recursive Extraction Library} \ident. ~\par
+ Extraction of the \Coq\ library {\tt\ident.v} and all other modules
+ {\tt\ident.v} depends on.
+\end{description}
+
+The list of globals \qualid$_i$ does not need to be
+exhaustive: it is automatically completed into a complete and minimal
+environment.
+
+\asection{Extraction options}
+
+\asubsection{Setting the target language}
+\comindex{Extraction Language}
+
+The ability to fix target language is the first and more important
+of the extraction options. Default is Ocaml.
+\begin{description}
+\item {\tt Extraction Language Ocaml}.
+\item {\tt Extraction Language Haskell}.
+\item {\tt Extraction Language Scheme}.
+\end{description}
+
+\asubsection{Inlining and optimizations}
+
+Since Objective Caml is a strict language, the extracted
+code has to be optimized in order to be efficient (for instance, when
+using induction principles we do not want to compute all the recursive
+calls but only the needed ones). So the extraction mechanism provides
+an automatic optimization routine that will be
+called each time the user want to generate Ocaml programs. Essentially,
+it performs constants inlining and reductions. Therefore some
+constants may not appear in resulting monolithic Ocaml program.
+In the case of modular extraction, even if some inlining is done, the
+inlined constant are nevertheless printed, to ensure
+session-independent programs.
+
+Concerning Haskell, such optimizations are less useful because of
+lazyness. We still make some optimizations, for example in order to
+produce more readable code.
+
+All these optimizations are controled by the following \Coq\ options:
+
+\begin{description}
+
+\item \comindex{Set Extraction Optimize}
+{\tt Set Extraction Optimize.}
+
+\item \comindex{Unset Extraction Optimize}
+{\tt Unset Extraction Optimize.}
+
+Default is Set. This control all optimizations made on the ML terms
+(mostly reduction of dummy beta/iota redexes, but also simplifications on
+Cases, etc). Put this option to Unset if you want a ML term as close as
+possible to the Coq term.
+
+\item \comindex{Set Extraction AutoInline}
+{\tt Set Extraction AutoInline.}
+
+\item \comindex{Unset Extraction AutoInline}
+{\tt Unset Extraction AutoInline.}
+
+Default is Set, so by default, the extraction mechanism feels free to
+inline the bodies of some defined constants, according to some heuristics
+like size of bodies, useness of some arguments, etc. Those heuristics are
+not always perfect, you may want to disable this feature, do it by Unset.
+
+\item \comindex{Extraction Inline}
+{\tt Extraction Inline} \qualid$_1$ \dots\ \qualid$_n$.
+
+\item \comindex{Extraction NoInline}
+{\tt Extraction NoInline} \qualid$_1$ \dots\ \qualid$_n$.
+
+In addition to the automatic inline feature, you can now tell precisely to
+inline some more constants by the {\tt Extraction Inline} command. Conversely,
+you can forbid the automatic inlining of some specific constants by
+the {\tt Extraction NoInline} command.
+Those two commands enable a precise control of what is inlined and what is not.
+
+\item \comindex{Print Extraction Inline}
+{\tt Print Extraction Inline}.
+
+Prints the current state of the table recording the custom inlinings
+declared by the two previous commands.
+
+\item \comindex{Reset Extraction Inline}
+{\tt Reset Extraction Inline}.
+
+Puts the table recording the custom inlinings back to empty.
+
+\end{description}
+
+
+\paragraph{Inlining and printing of a constant declaration.}
+
+A user can explicitly ask for a constant to be extracted by two means:
+\begin{itemize}
+\item by mentioning it on the extraction command line
+\item by extracting the whole \Coq\ module of this constant.
+\end{itemize}
+In both cases, the declaration of this constant will be present in the
+produced file.
+But this same constant may or may not be inlined in the following
+terms, depending on the automatic/custom inlining mechanism.
+
+
+For the constants non-explicitly required but needed for dependency
+reasons, there are two cases:
+\begin{itemize}
+\item If an inlining decision is taken, whether automatically or not,
+all occurrences of this constant are replaced by its extracted body, and
+this constant is not declared in the generated file.
+\item If no inlining decision is taken, the constant is normally
+ declared in the produced file.
+\end{itemize}
+
+\asubsection{Extra elimination of useless arguments}
+
+\begin{description}
+\item \comindex{Extraction Implicit}
+ {\tt Extraction Implicit} \qualid\ [ \ident$_1$ \dots\ \ident$_n$ ].
+
+This experimental command allows to declare some arguments of
+\qualid\ as implicit, i.e. useless in extracted code and hence to
+be removed by extraction. Here \qualid\ can be any function or
+inductive constructor, and \ident$_i$ are the names of the concerned
+arguments. In fact, an argument can also be referred by a number
+indicating its position, starting from 1. When an actual extraction
+takes place, an error is raised if the {\tt Extraction Implicit}
+declarations cannot be honored, that is if any of the implicited
+variables still occurs in the final code. This declaration of useless
+arguments is independent but complementary to the main elimination
+principles of extraction (logical parts and types).
+\end{description}
+
+\asubsection{Realizing axioms}\label{extraction:axioms}
+
+Extraction will fail if it encounters an informative
+axiom not realized (see Section~\ref{extraction:axioms}).
+A warning will be issued if it encounters an logical axiom, to remind
+user that inconsistent logical axioms may lead to incorrect or
+non-terminating extracted terms.
+
+It is possible to assume some axioms while developing a proof. Since
+these axioms can be any kind of proposition or object or type, they may
+perfectly well have some computational content. But a program must be
+a closed term, and of course the system cannot guess the program which
+realizes an axiom. Therefore, it is possible to tell the system
+what ML term corresponds to a given axiom.
+
+\comindex{Extract Constant}
+\begin{description}
+\item{\tt Extract Constant \qualid\ => \str.} ~\par
+ Give an ML extraction for the given constant.
+ The \str\ may be an identifier or a quoted string.
+\item{\tt Extract Inlined Constant \qualid\ => \str.} ~\par
+ Same as the previous one, except that the given ML terms will
+ be inlined everywhere instead of being declared via a let.
+\end{description}
+
+Note that the {\tt Extract Inlined Constant} command is sugar
+for an {\tt Extract Constant} followed by a {\tt Extraction Inline}.
+Hence a {\tt Reset Extraction Inline} will have an effect on the
+realized and inlined axiom.
+
+Of course, it is the responsibility of the user to ensure that the ML
+terms given to realize the axioms do have the expected types. In
+fact, the strings containing realizing code are just copied in the
+extracted files. The extraction recognizes whether the realized axiom
+should become a ML type constant or a ML object declaration.
+
+\Example
+\begin{coq_example}
+Axiom X:Set.
+Axiom x:X.
+Extract Constant X => "int".
+Extract Constant x => "0".
+\end{coq_example}
+
+Notice that in the case of type scheme axiom (i.e. whose type is an
+arity, that is a sequence of product finished by a sort), then some type
+variables has to be given. The syntax is then:
+
+\begin{description}
+\item{\tt Extract Constant \qualid\ \str$_1$ \ldots \str$_n$ => \str.} ~\par
+\end{description}
+
+The number of type variables is checked by the system.
+
+\Example
+\begin{coq_example}
+Axiom Y : Set -> Set -> Set.
+Extract Constant Y "'a" "'b" => " 'a*'b ".
+\end{coq_example}
+
+Realizing an axiom via {\tt Extract Constant} is only useful in the
+case of an informative axiom (of sort Type or Set). A logical axiom
+have no computational content and hence will not appears in extracted
+terms. But a warning is nonetheless issued if extraction encounters a
+logical axiom. This warning reminds user that inconsistent logical
+axioms may lead to incorrect or non-terminating extracted terms.
+
+If an informative axiom has not been realized before an extraction, a
+warning is also issued and the definition of the axiom is filled with
+an exception labeled {\tt AXIOM TO BE REALIZED}. The user must then
+search these exceptions inside the extracted file and replace them by
+real code.
+
+\comindex{Extract Inductive}
+
+The system also provides a mechanism to specify ML terms for inductive
+types and constructors. For instance, the user may want to use the ML
+native boolean type instead of \Coq\ one. The syntax is the following:
+
+\begin{description}
+\item{\tt Extract Inductive \qualid\ => \str\ [ \str\ \dots \str\ ]\
+{\it optstring}.} ~\par
+ Give an ML extraction for the given inductive type. You must specify
+ extractions for the type itself (first \str) and all its
+ constructors (between square brackets). If given, the final optional
+ string should contain a function emulating pattern-matching over this
+ inductive type. If this optional string is not given, the ML
+ extraction must be an ML inductive datatype, and the native
+ pattern-matching of the language will be used.
+\end{description}
+
+For an inductive type with $k$ constructor, the function used to
+emulate the match should expect $(k+1)$ arguments, first the $k$
+branches in functional form, and then the inductive element to
+destruct. For instance, the match branch \verb$| S n => foo$ gives the
+functional form \verb$(fun n -> foo)$. Note that a constructor with no
+argument is considered to have one unit argument, in order to block
+early evaluation of the branch: \verb$| O => bar$ leads to the functional
+form \verb$(fun () -> bar)$. For instance, when extracting {\tt nat}
+into {\tt int}, the code to provide has type:
+{\tt (unit->'a)->(int->'a)->int->'a}.
+
+As for {\tt Extract Inductive}, this command should be used with care:
+\begin{itemize}
+\item The ML code provided by the user is currently \emph{not} checked at all by
+ extraction, even for syntax errors.
+
+\item Extracting an inductive type to a pre-existing ML inductive type
+is quite sound. But extracting to a general type (by providing an
+ad-hoc pattern-matching) will often \emph{not} be fully rigorously
+correct. For instance, when extracting {\tt nat} to Ocaml's {\tt
+int}, it is theoretically possible to build {\tt nat} values that are
+larger than Ocaml's {\tt max\_int}. It is the user's responsability to
+be sure that no overflow or other bad events occur in practice.
+
+\item Translating an inductive type to an ML type does \emph{not}
+magically improve the asymptotic complexity of functions, even if the
+ML type is an efficient representation. For instance, when extracting
+{\tt nat} to Ocaml's {\tt int}, the function {\tt mult} stays
+quadratic. It might be interesting to associate this translation with
+some specific {\tt Extract Constant} when primitive counterparts exist.
+\end{itemize}
+
+\Example
+Typical examples are the following:
+\begin{coq_example}
+Extract Inductive unit => "unit" [ "()" ].
+Extract Inductive bool => "bool" [ "true" "false" ].
+Extract Inductive sumbool => "bool" [ "true" "false" ].
+\end{coq_example}
+
+If an inductive constructor or type has arity 2 and the corresponding
+string is enclosed by parenthesis, then the rest of the string is used
+as infix constructor or type.
+\begin{coq_example}
+Extract Inductive list => "list" [ "[]" "(::)" ].
+Extract Inductive prod => "(*)" [ "(,)" ].
+\end{coq_example}
+
+As an example of translation to a non-inductive datatype, let's turn
+{\tt nat} into Ocaml's {\tt int} (see caveat above):
+\begin{coq_example}
+Extract Inductive nat => int [ "0" "succ" ]
+ "(fun fO fS n => if n=0 then fO () else fS (n-1))".
+\end{coq_example}
+
+\asubsection{Avoiding conflicts with existing filenames}
+
+\comindex{Extraction Blacklist}
+
+When using {\tt Extraction Library}, the names of the extracted files
+directly depends from the names of the \Coq\ files. It may happen that
+these filenames are in conflict with already existing files,
+either in the standard library of the target language or in other
+code that is meant to be linked with the extracted code.
+For instance the module {\tt List} exists both in \Coq\ and in Ocaml.
+It is possible to instruct the extraction not to use particular filenames.
+
+\begin{description}
+\item{\tt Extraction Blacklist \ident \ldots \ident.} ~\par
+ Instruct the extraction to avoid using these names as filenames
+ for extracted code.
+\item{\tt Print Extraction Blacklist.} ~\par
+ Show the current list of filenames the extraction should avoid.
+\item{\tt Reset Extraction Blacklist.} ~\par
+ Allow the extraction to use any filename.
+\end{description}
+
+For Ocaml, a typical use of these commands is
+{\tt Extraction Blacklist String List}.
+
+\asection{Differences between \Coq\ and ML type systems}
+
+
+Due to differences between \Coq\ and ML type systems,
+some extracted programs are not directly typable in ML.
+We now solve this problem (at least in Ocaml) by adding
+when needed some unsafe casting {\tt Obj.magic}, which give
+a generic type {\tt 'a} to any term.
+
+For example, here are two kinds of problem that can occur:
+
+\begin{itemize}
+ \item If some part of the program is {\em very} polymorphic, there
+ may be no ML type for it. In that case the extraction to ML works
+ all right but the generated code may be refused by the ML
+ type-checker. A very well known example is the {\em distr-pair}
+ function:
+\begin{verbatim}
+Definition dp :=
+ fun (A B:Set)(x:A)(y:B)(f:forall C:Set, C->C) => (f A x, f B y).
+\end{verbatim}
+
+In Ocaml, for instance, the direct extracted term would be:
+
+\begin{verbatim}
+let dp x y f = Pair((f () x),(f () y))
+\end{verbatim}
+
+and would have type:
+\begin{verbatim}
+dp : 'a -> 'a -> (unit -> 'a -> 'b) -> ('b,'b) prod
+\end{verbatim}
+
+which is not its original type, but a restriction.
+
+We now produce the following correct version:
+\begin{verbatim}
+let dp x y f = Pair ((Obj.magic f () x), (Obj.magic f () y))
+\end{verbatim}
+
+ \item Some definitions of \Coq\ may have no counterpart in ML. This
+ happens when there is a quantification over types inside the type
+ of a constructor; for example:
+\begin{verbatim}
+Inductive anything : Set := dummy : forall A:Set, A -> anything.
+\end{verbatim}
+
+which corresponds to the definition of an ML dynamic type.
+In Ocaml, we must cast any argument of the constructor dummy.
+
+\end{itemize}
+
+Even with those unsafe castings, you should never get error like
+``segmentation fault''. In fact even if your program may seem
+ill-typed to the Ocaml type-checker, it can't go wrong: it comes
+from a Coq well-typed terms, so for example inductives will always
+have the correct number of arguments, etc.
+
+More details about the correctness of the extracted programs can be
+found in \cite{Let02}.
+
+We have to say, though, that in most ``realistic'' programs, these
+problems do not occur. For example all the programs of Coq library are
+accepted by Caml type-checker without any {\tt Obj.magic} (see examples below).
+
+
+
+\asection{Some examples}
+
+We present here two examples of extractions, taken from the
+\Coq\ Standard Library. We choose \ocaml\ as target language,
+but all can be done in the other dialects with slight modifications.
+We then indicate where to find other examples and tests of Extraction.
+
+\asubsection{A detailed example: Euclidean division}
+
+The file {\tt Euclid} contains the proof of Euclidean division
+(theorem {\tt eucl\_dev}). The natural numbers defined in the example
+files are unary integers defined by two constructors $O$ and $S$:
+\begin{coq_example*}
+Inductive nat : Set :=
+ | O : nat
+ | S : nat -> nat.
+\end{coq_example*}
+
+This module contains a theorem {\tt eucl\_dev}, whose type is:
+\begin{verbatim}
+forall b:nat, b > 0 -> forall a:nat, diveucl a b
+\end{verbatim}
+where {\tt diveucl} is a type for the pair of the quotient and the
+modulo, plus some logical assertions that disappear during extraction.
+We can now extract this program to \ocaml:
+
+\begin{coq_eval}
+Reset Initial.
+\end{coq_eval}
+\begin{coq_example}
+Require Import Euclid Wf_nat.
+Extraction Inline gt_wf_rec lt_wf_rec induction_ltof2.
+Recursive Extraction eucl_dev.
+\end{coq_example}
+
+The inlining of {\tt gt\_wf\_rec} and others is not
+mandatory. It only enhances readability of extracted code.
+You can then copy-paste the output to a file {\tt euclid.ml} or let
+\Coq\ do it for you with the following command:
+
+\begin{coq_example}
+Extraction "euclid" eucl_dev.
+\end{coq_example}
+
+Let us play the resulting program:
+
+\begin{verbatim}
+# #use "euclid.ml";;
+type nat = O | S of nat
+type sumbool = Left | Right
+val minus : nat -> nat -> nat = <fun>
+val le_lt_dec : nat -> nat -> sumbool = <fun>
+val le_gt_dec : nat -> nat -> sumbool = <fun>
+type diveucl = Divex of nat * nat
+val eucl_dev : nat -> nat -> diveucl = <fun>
+# eucl_dev (S (S O)) (S (S (S (S (S O)))));;
+- : diveucl = Divex (S (S O), S O)
+\end{verbatim}
+It is easier to test on \ocaml\ integers:
+\begin{verbatim}
+# let rec nat_of_int = function 0 -> O | n -> S (nat_of_int (n-1));;
+val i2n : int -> nat = <fun>
+# let rec int_of_nat = function O -> 0 | S p -> 1+(int_of_nat p);;
+val n2i : nat -> int = <fun>
+# let div a b =
+ let Divex (q,r) = eucl_dev (nat_of_int b) (nat_of_int a)
+ in (int_of_nat q, int_of_nat r);;
+val div : int -> int -> int * int = <fun>
+# div 173 15;;
+- : int * int = (11, 8)
+\end{verbatim}
+
+Note that these {\tt nat\_of\_int} and {\tt int\_of\_nat} are now
+available via a mere {\tt Require Import ExtrOcamlIntConv} and then
+adding these functions to the list of functions to extract. This file
+{\tt ExtrOcamlIntConv.v} and some others in {\tt plugins/extraction/}
+are meant to help building concrete program via extraction.
+
+\asubsection{Extraction's horror museum}
+
+Some pathological examples of extraction are grouped in the file
+{\tt test-suite/success/extraction.v} of the sources of \Coq.
+
+\asubsection{Users' Contributions}
+
+ Several of the \Coq\ Users' Contributions use extraction to produce
+ certified programs. In particular the following ones have an automatic
+ extraction test (just run {\tt make} in those directories):
+
+ \begin{itemize}
+ \item Bordeaux/Additions
+ \item Bordeaux/EXCEPTIONS
+ \item Bordeaux/SearchTrees
+ \item Dyade/BDDS
+ \item Lannion
+ \item Lyon/CIRCUITS
+ \item Lyon/FIRING-SQUAD
+ \item Marseille/CIRCUITS
+ \item Muenchen/Higman
+ \item Nancy/FOUnify
+ \item Rocq/ARITH/Chinese
+ \item Rocq/COC
+ \item Rocq/GRAPHS
+ \item Rocq/HIGMAN
+ \item Sophia-Antipolis/Stalmarck
+ \item Suresnes/BDD
+ \end{itemize}
+
+ Lannion, Rocq/HIGMAN and Lyon/CIRCUITS are a bit particular. They are
+ examples of developments where {\tt Obj.magic} are needed.
+ This is probably due to an heavy use of impredicativity.
+ After compilation those two examples run nonetheless,
+ thanks to the correction of the extraction~\cite{Let02}.
+
+% $Id: Extraction.tex 13153 2010-06-15 16:09:43Z letouzey $
+
+%%% Local Variables:
+%%% mode: latex
+%%% TeX-master: "Reference-Manual"
+%%% End: