\achapter{Extraction of programs in Objective Caml and Haskell}
\label{Extraction}
\aauthor{Jean-Christophe Filliātre and Pierre Letouzey}
\index{Extraction}

We present here the \Coq\ extraction commands, used to build certified
and relatively efficient functional programs, extracting them from
either \Coq\ functions or \Coq\ proofs of specifications. The
functional languages available as output are currently \ocaml{},
\textsc{Haskell} and \textsc{Scheme}.  In the following, ``ML'' will
be used (abusively) to refer to any of the three.

\paragraph{Differences with old versions.}
The current extraction mechanism is new for version 7.0 of {\Coq}.
In particular, the \FW\ toplevel used as an intermediate step between 
\Coq\ and ML has been withdrawn.  It is also not possible 
any more to import ML objects in this \FW\ toplevel.
The current mechanism also differs from
the one in previous versions of \Coq: there is no more
an explicit toplevel for the language (formerly called \textsc{Fml}). 

\asection{Generating ML code}
\comindex{Extraction}
\comindex{Recursive Extraction}
\comindex{Extraction Module}
\comindex{Recursive Extraction Module}

The next two commands are meant to be used for rapid preview of
extraction. They both display extracted term(s) inside \Coq.

\begin{description}
\item {\tt Extraction \qualid.} ~\par
  Extracts one constant or module in the \Coq\ toplevel.

\item {\tt Recursive Extraction  \qualid$_1$ \dots\ \qualid$_n$.} ~\par
  Recursive extraction of all the globals (or modules) \qualid$_1$ \dots\
  \qualid$_n$ and all their dependencies in the \Coq\ toplevel.
\end{description}

%% TODO error messages

All the following commands produce real ML files. User can choose to produce
one monolithic file or one file per \Coq\ library. 

\begin{description}
\item {\tt Extraction "{\em file}"}  
      \qualid$_1$ \dots\ \qualid$_n$. ~\par
  Recursive extraction of all the globals (or modules) \qualid$_1$ \dots\
  \qualid$_n$ and all their dependencies in one monolithic file {\em file}.
  Global and local identifiers are renamed according to the chosen ML
  language to fulfill its syntactic conventions, keeping original
  names as much as possible.
  
\item {\tt Extraction Library} \ident. ~\par 
  Extraction of the whole \Coq\ library {\tt\ident.v} to an ML module
  {\tt\ident.ml}.  In case of name clash, identifiers are here renamed
  using prefixes \verb!coq_!  or \verb!Coq_! to ensure a
  session-independent renaming.

\item {\tt Recursive Extraction Library} \ident. ~\par
  Extraction of the \Coq\ library {\tt\ident.v} and all other modules 
  {\tt\ident.v} depends on. 
\end{description}

The list of globals \qualid$_i$ does not need to be
exhaustive: it is automatically completed into a complete and minimal
environment. 

\asection{Extraction options}

\asubsection{Setting the target language}
\comindex{Extraction Language}

The ability to fix target language is the first and more important
of the extraction options. Default is Ocaml.
\begin{description}
\item {\tt Extraction Language Ocaml}.
\item {\tt Extraction Language Haskell}.
\item {\tt Extraction Language Scheme}.
\end{description}

\asubsection{Inlining and optimizations}

Since Objective Caml is a strict language, the extracted
code has to be optimized in order to be efficient (for instance, when
using induction principles we do not want to compute all the recursive
calls but only the needed ones). So the extraction mechanism provides
an automatic optimization routine that will be
called each time the user want to generate Ocaml programs. Essentially,
it performs constants inlining and reductions.  Therefore some
constants may not appear in resulting monolithic Ocaml program.
In the case of modular extraction, even if some inlining is done, the
inlined constant are nevertheless printed, to ensure
session-independent programs.

Concerning Haskell, such optimizations are less useful because of
lazyness. We still make some optimizations, for example in order to
produce more readable code. 

All these optimizations are controled by the following \Coq\ options: 

\begin{description}

\item \comindex{Set Extraction Optimize}
{\tt Set Extraction Optimize.}

\item \comindex{Unset Extraction Optimize}
{\tt Unset Extraction Optimize.}

Default is Set. This control all optimizations made on the ML terms 
(mostly reduction of dummy beta/iota redexes, but also simplifications on
Cases, etc). Put this option to Unset if you want a ML term as close as 
possible to the Coq term.

\item \comindex{Set Extraction AutoInline}
{\tt Set Extraction AutoInline.} 

\item \comindex{Unset Extraction AutoInline}
{\tt Unset Extraction AutoInline.} 

Default is Set, so by default, the extraction mechanism feels free to 
inline the bodies of some defined constants, according to some heuristics 
like size of bodies, useness of some arguments, etc. Those heuristics are 
not always perfect, you may want to disable this feature, do it by Unset. 

\item \comindex{Extraction Inline}
{\tt Extraction Inline} \qualid$_1$ \dots\ \qualid$_n$. 

\item \comindex{Extraction NoInline}
{\tt Extraction NoInline} \qualid$_1$ \dots\ \qualid$_n$. 

In addition to the automatic inline feature, you can now tell precisely to 
inline some more constants by the {\tt Extraction Inline} command. Conversely, 
you can forbid the automatic inlining of some specific constants by
the {\tt Extraction NoInline} command.
Those two commands enable a precise control of what is inlined and what is not. 

\item \comindex{Print Extraction Inline}
{\tt Print Extraction Inline}. 

Prints the current state of the table recording the custom inlinings 
declared by the two previous commands. 

\item \comindex{Reset Extraction Inline}
{\tt Reset Extraction Inline}. 

Puts the table recording the custom inlinings back to empty. 

\end{description}


\paragraph{Inlining and printing of a constant declaration.}

A user can explicitly ask for a constant to be extracted by two means:
\begin{itemize}
\item by mentioning it on the extraction command line
\item by extracting the whole \Coq\ module of this constant.
\end{itemize}
In both cases, the declaration of this constant will be present in the
produced file. 
But this same constant may or may not be inlined in the following
terms, depending on the automatic/custom inlining mechanism.  


For the constants non-explicitly required but needed for dependency
reasons, there are two cases: 
\begin{itemize}
\item If an inlining decision is taken, whether automatically or not,
all occurrences of this constant are replaced by its extracted body, and
this constant is not declared in the generated file.
\item If no inlining decision is taken, the constant is normally
  declared in the produced file. 
\end{itemize}

\asubsection{Extra elimination of useless arguments}

\begin{description}
\item \comindex{Extraction Implicit}
 {\tt Extraction Implicit} \qualid\ [ \ident$_1$ \dots\ \ident$_n$ ].

This experimental command allows to declare some arguments of
\qualid\ as implicit, i.e. useless in extracted code and hence to
be removed by extraction. Here \qualid\ can be any function or
inductive constructor, and \ident$_i$ are the names of the concerned
arguments. In fact, an argument can also be referred by a number
indicating its position, starting from 1. When an actual extraction
takes place, an error is raised if the {\tt Extraction Implicit}
declarations cannot be honored, that is if any of the implicited
variables still occurs in the final code. This declaration of useless
arguments is independent but complementary to the main elimination
principles of extraction (logical parts and types).
\end{description}

\asubsection{Realizing axioms}\label{extraction:axioms}

Extraction will fail if it encounters an informative
axiom not realized (see Section~\ref{extraction:axioms}). 
A warning will be issued if it encounters an logical axiom, to remind 
user that inconsistent logical axioms may lead to incorrect or
non-terminating extracted terms. 

It is possible to assume some axioms while developing a proof. Since
these axioms can be any kind of proposition or object or type, they may
perfectly well have some computational content. But a program must be
a closed term, and of course the system cannot guess the program which
realizes an axiom.  Therefore, it is possible to tell the system
what ML term corresponds to a given axiom. 

\comindex{Extract Constant}
\begin{description}
\item{\tt Extract Constant \qualid\ => \str.} ~\par
  Give an ML extraction for the given constant.
  The \str\ may be an identifier or a quoted string.
\item{\tt Extract Inlined Constant \qualid\ => \str.} ~\par
  Same as the previous one, except that the given ML terms will
  be inlined everywhere instead of being declared via a let.
\end{description}

Note that the {\tt Extract Inlined Constant} command is sugar
for an {\tt Extract Constant} followed by a {\tt Extraction Inline}. 
Hence a {\tt Reset Extraction Inline} will have an effect on the
realized and inlined axiom.

Of course, it is the responsibility of the user to ensure that the ML
terms given to realize the axioms do have the expected types.  In
fact, the strings containing realizing code are just copied in the
extracted files. The extraction recognizes whether the realized axiom
should become a ML type constant or a ML object declaration.

\Example
\begin{coq_example}
Axiom X:Set.
Axiom x:X.
Extract Constant X => "int".
Extract Constant x => "0".
\end{coq_example}   

Notice that in the case of type scheme axiom (i.e. whose type is an
arity, that is a sequence of product finished by a sort), then some type
variables has to be given. The syntax is then: 

\begin{description}
\item{\tt Extract Constant \qualid\ \str$_1$ \ldots \str$_n$ => \str.} ~\par
\end{description}

The number of type variables is checked by the system. 

\Example
\begin{coq_example}
Axiom Y : Set -> Set -> Set.
Extract Constant Y "'a" "'b" => " 'a*'b ".
\end{coq_example}

Realizing an axiom via {\tt Extract Constant} is only useful in the
case of an informative axiom (of sort Type or Set). A logical axiom
have no computational content and hence will not appears in extracted
terms. But a warning is nonetheless issued if extraction encounters a
logical axiom. This warning reminds user that inconsistent logical
axioms may lead to incorrect or non-terminating extracted terms.

If an informative axiom has not been realized before an extraction, a
warning is also issued and the definition of the axiom is filled with
an exception labeled {\tt AXIOM TO BE REALIZED}. The user must then
search these exceptions inside the extracted file and replace them by
real code.

\comindex{Extract Inductive} 

The system also provides a mechanism to specify ML terms for inductive
types and constructors.  For instance, the user may want to use the ML
native boolean type instead of \Coq\ one.  The syntax is the following:

\begin{description}
\item{\tt Extract Inductive \qualid\ => \str\ [ \str\ \dots \str\ ]\
{\it optstring}.} ~\par
  Give an ML extraction for the given inductive type. You must specify
  extractions for the type itself (first \str) and all its
  constructors (between square brackets). If given, the final optional
  string should contain a function emulating pattern-matching over this
  inductive type. If this optional string is not given, the ML
  extraction must be an ML inductive datatype, and the native
  pattern-matching of the language will be used.
\end{description}

For an inductive type with $k$ constructor, the function used to
emulate the match should expect $(k+1)$ arguments, first the $k$
branches in functional form, and then the inductive element to
destruct. For instance, the match branch \verb$| S n => foo$ gives the
functional form \verb$(fun n -> foo)$. Note that a constructor with no
argument is considered to have one unit argument, in order to block
early evaluation of the branch: \verb$| O => bar$ leads to the functional
form \verb$(fun () -> bar)$. For instance, when extracting {\tt nat}
into {\tt int}, the code to provide has type:
{\tt (unit->'a)->(int->'a)->int->'a}.
    
As for {\tt Extract Inductive}, this command should be used with care:
\begin{itemize}
\item The ML code provided by the user is currently \emph{not} checked at all by
  extraction, even for syntax errors.

\item Extracting an inductive type to a pre-existing ML inductive type
is quite sound. But extracting to a general type (by providing an
ad-hoc pattern-matching) will often \emph{not} be fully rigorously
correct.  For instance, when extracting {\tt nat} to Ocaml's {\tt
int}, it is theoretically possible to build {\tt nat} values that are
larger than Ocaml's {\tt max\_int}. It is the user's responsability to
be sure that no overflow or other bad events occur in practice.

\item Translating an inductive type to an ML type does \emph{not}
magically improve the asymptotic complexity of functions, even if the
ML type is an efficient representation. For instance, when extracting
{\tt nat} to Ocaml's {\tt int}, the function {\tt mult} stays
quadratic. It might be interesting to associate this translation with
some specific {\tt Extract Constant} when primitive counterparts exist.
\end{itemize}

\Example
Typical examples are the following:
\begin{coq_example}
Extract Inductive unit => "unit" [ "()" ].
Extract Inductive bool => "bool" [ "true" "false" ].
Extract Inductive sumbool => "bool" [ "true" "false" ].
\end{coq_example}

If an inductive constructor or type has arity 2 and the corresponding 
string is enclosed by parenthesis, then the rest of the string is used
as infix constructor or type. 
\begin{coq_example}
Extract Inductive list => "list" [ "[]" "(::)" ].
Extract Inductive prod => "(*)"  [ "(,)" ].
\end{coq_example}

As an example of translation to a non-inductive datatype, let's turn
{\tt nat} into Ocaml's {\tt int} (see caveat above):
\begin{coq_example}
Extract Inductive nat => int [ "0" "succ" ]
 "(fun fO fS n -> if n=0 then fO () else fS (n-1))".
\end{coq_example}

\asubsection{Avoiding conflicts with existing filenames}

\comindex{Extraction Blacklist}

When using {\tt Extraction Library}, the names of the extracted files
directly depends from the names of the \Coq\ files. It may happen that
these filenames are in conflict with already existing files, 
either in the standard library of the target language or in other
code that is meant to be linked with the extracted code. 
For instance the module {\tt List} exists both in \Coq\ and in Ocaml.
It is possible to instruct the extraction not to use particular filenames.

\begin{description}
\item{\tt Extraction Blacklist \ident \ldots \ident.} ~\par
  Instruct the extraction to avoid using these names as filenames
  for extracted code. 
\item{\tt Print Extraction Blacklist.} ~\par
  Show the current list of filenames the extraction should avoid.
\item{\tt Reset Extraction Blacklist.} ~\par
  Allow the extraction to use any filename.
\end{description}

For Ocaml, a typical use of these commands is
{\tt Extraction Blacklist String List}.

\asection{Differences between \Coq\ and ML type systems}


Due to differences between \Coq\ and ML type systems, 
some extracted programs are not directly typable in ML. 
We now solve this problem (at least in Ocaml) by adding 
when needed some unsafe casting {\tt Obj.magic}, which give
a generic type {\tt 'a} to any term.

For example, here are two kinds of problem that can occur:

\begin{itemize}
  \item If some part of the program is {\em very} polymorphic, there
    may be no ML type for it. In that case the extraction to ML works
    all right but the generated code may be refused by the ML
    type-checker. A very well known example is the {\em distr-pair}
    function:
\begin{verbatim}
Definition dp := 
 fun (A B:Set)(x:A)(y:B)(f:forall C:Set, C->C) => (f A x, f B y).
\end{verbatim}

In Ocaml, for instance, the direct extracted term would be:

\begin{verbatim}
let dp x y f = Pair((f () x),(f () y))
\end{verbatim}

and would have type:
\begin{verbatim}
dp : 'a -> 'a -> (unit -> 'a -> 'b) -> ('b,'b) prod
\end{verbatim}

which is not its original type, but a restriction.

We now produce the following correct version:
\begin{verbatim}
let dp x y f = Pair ((Obj.magic f () x), (Obj.magic f () y))
\end{verbatim}

  \item Some definitions of \Coq\ may have no counterpart in ML. This
    happens when there is a quantification over types inside the type
    of a constructor; for example:
\begin{verbatim}
Inductive anything : Set := dummy : forall A:Set, A -> anything.
\end{verbatim}

which corresponds to the definition of an ML dynamic type.
In Ocaml, we must cast any argument of the constructor dummy.

\end{itemize}

Even with those unsafe castings, you should never get error like
``segmentation fault''. In fact even if your program may seem
ill-typed to the Ocaml type-checker, it can't go wrong: it comes 
from a Coq well-typed terms, so for example inductives will always 
have the correct number of arguments, etc. 

More details about the correctness of the extracted programs can be 
found in \cite{Let02}.

We have to say, though, that in most ``realistic'' programs, these
problems do not occur. For example all the programs of Coq library are
accepted by Caml type-checker without any {\tt Obj.magic} (see examples below).


\asection{Some examples}

We present here two examples of extractions, taken from the 
\Coq\ Standard Library. We choose \ocaml\ as target language, 
but all can be done in the other dialects with slight modifications.
We then indicate where to find other examples and tests of Extraction.

\asubsection{A detailed example: Euclidean division}

The file {\tt Euclid} contains the proof of Euclidean division
(theorem {\tt eucl\_dev}). The natural numbers defined in the example
files are unary integers defined by two constructors $O$ and $S$:
\begin{coq_example*}
Inductive nat : Set :=
  | O : nat
  | S : nat -> nat.
\end{coq_example*}

This module contains a theorem {\tt eucl\_dev}, whose type is:
\begin{verbatim}
forall b:nat, b > 0 -> forall a:nat, diveucl a b
\end{verbatim}
where {\tt diveucl} is a type for the pair of the quotient and the
modulo, plus some logical assertions that disappear during extraction.
We can now extract this program to \ocaml:

\begin{coq_eval}
Reset Initial.
\end{coq_eval}
\begin{coq_example}
Require Import Euclid Wf_nat.
Extraction Inline gt_wf_rec lt_wf_rec induction_ltof2.
Recursive Extraction eucl_dev.
\end{coq_example}

The inlining of {\tt gt\_wf\_rec} and others is not
mandatory. It only enhances readability of extracted code.
You can then copy-paste the output to a file {\tt euclid.ml} or let 
\Coq\ do it for you with the following command: 

\begin{verbatim}
Extraction "euclid" eucl_dev.
\end{verbatim}

Let us play the resulting program:

\begin{verbatim}
# #use "euclid.ml";;
type nat = O | S of nat
type sumbool = Left | Right
val minus : nat -> nat -> nat = <fun>
val le_lt_dec : nat -> nat -> sumbool = <fun>
val le_gt_dec : nat -> nat -> sumbool = <fun>
type diveucl = Divex of nat * nat
val eucl_dev : nat -> nat -> diveucl = <fun>
# eucl_dev (S (S O)) (S (S (S (S (S O)))));;
- : diveucl = Divex (S (S O), S O)
\end{verbatim}
It is easier to test on \ocaml\ integers:
\begin{verbatim}
# let rec nat_of_int = function 0 -> O | n -> S (nat_of_int (n-1));;
val i2n : int -> nat = <fun>
# let rec int_of_nat = function O -> 0 | S p -> 1+(int_of_nat p);;
val n2i : nat -> int = <fun>
# let div a b = 
     let Divex (q,r) = eucl_dev (nat_of_int b) (nat_of_int a)
     in (int_of_nat q, int_of_nat r);;
val div : int -> int -> int * int = <fun>
# div 173 15;;
- : int * int = (11, 8)
\end{verbatim}

Note that these {\tt nat\_of\_int} and {\tt int\_of\_nat} are now
available via a mere {\tt Require Import ExtrOcamlIntConv} and then
adding these functions to the list of functions to extract. This file
{\tt ExtrOcamlIntConv.v} and some others in {\tt plugins/extraction/}
are meant to help building concrete program via extraction.

\asubsection{Extraction's horror museum}

Some pathological examples of extraction are grouped in the file
{\tt test-suite/success/extraction.v} of the sources of \Coq.

\asubsection{Users' Contributions}

 Several of the \Coq\ Users' Contributions use extraction to produce 
 certified programs. In particular the following ones have an automatic 
 extraction test (just run {\tt make} in those directories): 

 \begin{itemize}
 \item Bordeaux/Additions
 \item Bordeaux/EXCEPTIONS
 \item Bordeaux/SearchTrees
 \item Dyade/BDDS
 \item Lannion
 \item Lyon/CIRCUITS
 \item Lyon/FIRING-SQUAD
 \item Marseille/CIRCUITS
 \item Muenchen/Higman
 \item Nancy/FOUnify
 \item Rocq/ARITH/Chinese
 \item Rocq/COC
 \item Rocq/GRAPHS
 \item Rocq/HIGMAN
 \item Sophia-Antipolis/Stalmarck
 \item Suresnes/BDD
 \end{itemize}

 Lannion, Rocq/HIGMAN and Lyon/CIRCUITS are a bit particular. They are 
 examples of developments where {\tt Obj.magic} are needed.
 This is probably due to an heavy use of impredicativity.
 After compilation those two examples run nonetheless,
 thanks to the correction of the extraction~\cite{Let02}. 

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "Reference-Manual"
%%% End: