doc/Extraction.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557

\achapter{Execution of extracted programs in Caml and Haskell}
\label{CamlHaskellExtraction}
\aauthor{Benjamin Werner and Jean-Christophe Filli�tre}
\index{Extraction}

It is possible to use \Coq\ to build certified and relatively
efficient programs, extracting them from the proofs of their
specifications. The extracted objects are terms of \FW, and can be 
obtained at the \Coq\ toplevel with the command {\tt Extraction}
(see \ref{Extraction}).

We present here a \Coq\ module, {\tt Extraction}, which translates the
extracted terms to ML dialects, namely Caml Light, Objective Caml and
Haskell. In the following, we will not refer to a particular dialect
when possible and ``ML'' will be used to refer to any of the target
dialects.

One builds effective programs in an \FW\ toplevel (actually the \Coq\
toplevel) which contains the extracted objects and in which one can
import ML objects. Indeed, in order to instantiate and realize \Coq\
type and term variables, it is possible to import ML
objects in the \FW\ toplevel, as inductive types or axioms.

\Rem
The current mechanism of extraction of effective programs
from \Coq\ proofs slightly differs from the one in the versions of
\Coq\ anterior to the version V5.8. In these versions, there were an
explicit toplevel for the language {\sf Fml}. Moreover, it was not
possible to import ML objects in this {\sf Fml} toplevel.


%\section{Extraction facilities}
%
%(* TO DO *)
%

\medskip
In the first part of this document we describe the commands of the
{\tt Extraction} module, and we give some examples in the second part.


\asection{The {\tt Extraction} module}
\label{Extraction}

This section explains how to import ML objects, to realize axioms and
finally to generate ML code from the extracted programs of \FW.

These features do not belong to the core system, and appear as an
independent module called {\tt Extraction.v} (which is compiled during the
installation of the system). So the first thing to do is to load this
module:

\begin{coq_example*}
Require Extraction.
\end{coq_example*}

\asubsection{Generating executable ML code}
\comindex{Write Caml File}
\comindex{Write CamlLight File}
\comindex{Write Haskell File}
The \Coq\ commands to generate ML code are:
\begin{center}\begin{tabular}{ll}
  {\tt Write Caml File "\str" [ \ident$_1$ \dots\ \ident$_n$ ] 
       {\it options}.}
    & ({\em for Objective Caml\/}) \\
  {\tt Write CamlLight File "\str" [ \ident$_1$ \dots\ \ident$_n$ ]
       {\it options}.} & \\
  {\tt Write Haskell File "\str" [ \ident$_1$ \dots\ \ident$_n$ ]
       {\it options}.} & \\
\end{tabular}\end{center}
where \str\ is the name given to the file to be produced (the suffix
{\tt .ml} is added if necessary), and \ident$_1$ \dots\ \ident$_n$ the
names of the constants to be extracted. This list does not need to be
exhaustive: it is automatically completed into a complete and minimal
environment.  Remaining axioms are translated into exceptions, and a
warning is printed in that case. In particular, this will be the case
for {\tt False\_rec}. (We will see below how to realize axioms).

\paragraph{Optimizations.} 
Since Caml Light and Objective Caml are strict languages, the extracted
code has to be optimized in order to be efficient (for instance, when
using induction principles we do not want to compute all the recursive
calls but only the needed ones). So an optimization routine will be
called each time the user want to generate Caml programs. Essentially,
it performs constants expansions and reductions.  Therefore some
constants will not appear in the resulting Caml program (A warning is
printed for each such constant). To avoid this, just put the constant
name in the previous list \ident$_1$ \dots\ \ident$_n$ and it will not
be expanded. Moreover, three options allow the user to control the
expansion strategy :
\begin{description}
  \item[\texttt{noopt}] : specifies not to do any optimization.
  \item[\texttt{exact}] : specifies to extract exactly the given
    objects (no recursivity).
  \item[\texttt{expand [ \ident$_1$ \dots\ \ident$_n$ ]}] : 
    forces the expansion of the constants \ident$_1$ \dots\ \ident$_n$
    (when it is possible).
\end{description}


\asubsection{Realizing axioms}
\comindex{Link}

It is possible to assume some axioms while developing a proof. Since
these axioms can be any kind of proposition or object type, they may
perfectly well have some computational content. But a program must be
a closed term, and of course the system cannot guess the program which
realizes an axiom.  Therefore, it is possible to tell the system
what program (an \FW\ term actually) corresponds to a given \Coq\
axiom. The command is {\tt Link} and the syntax:
$$\mbox{\tt Link \ident\ := Fwterm.}$$
where \ident\ is the name of the axiom to realize and Fwterm\ the
term which realizes it. The system checks that this term has the same
type as the axiom \ident, and returns an error if not.  This command
attaches a body to an axiom, and can be seen as a transformation of an
axiom into a constant.

These semantical attachments have to be done {\em before} generating
the ML code. All type variables must be realized, and term variables
which are not realized will be translated into exceptions. 

\Example Let us illustrate this feature on a small
example. Assume that you have a type variable {\tt A} of type \Set:

\begin{coq_example*}
Parameter A : Set.
\end{coq_example*}

and that your specification proof assumes that there is an order
relation {\em inf} over that type (which has no computational
content), and that this relation is total and decidable:

\begin{coq_example*}
Parameter inf : A -> A -> Prop.
Axiom inf_total : (x,y:A) {(inf x y)}+{(inf y x)}.
\end{coq_example*}

Now suppose that we want to use this specification proof on natural
numbers; this means {\tt A} has to be instantiated by {\tt nat}
and the axiom {\tt inf\_total} will be realized, for instance, using the
order relation {\tt le} on that type and the decidability lemma 
{\tt le\_lt\_dec}. Here is how to proceed:

\begin{coq_example*}
Require Compare_dec.
\end{coq_example*}
\begin{coq_example}
Link A := nat.
Link inf_total := le_lt_dec.
\end{coq_example}

\Warning There is no rollback on the command {\tt Link}, that
is the semantical attachments are not forgotten when doing a {\tt Reset},
or a {\tt Restore State} command. This will be corrected in a later
version.

\asubsection{Importing ML objects}
In order to realize axioms and to instantiate programs on real data
types, like {\tt int}, {\tt string}, \dots\ or more complicated data
structures, one want to import existing ML objects in the \FW\
environment.  The system provides such features, through the commands
{\tt ML Import Constant} and {\tt ML Import Inductive}.
The first one imports an ML
object as a new axiom and the second one adds a new inductive
definition corresponding to an ML inductive type.  

\paragraph{Warning.}
In the case of Caml dialects, the system would be able to check the
correctness of the imported objects by looking into the interfaces
files of Caml modules ({\tt .mli} files), but this feature is not yet
implemented. So one must be careful when declaring the types of the
imported objects.

\paragraph{Caml names.}
When referencing a Caml object, you can use strings instead of
identifiers. Therefore you can use the double
underscore notation \verb!module__name! (Caml Light objects)
or the dot notation \verb!module.name! (Objective Caml objects)
to precise the module in which lies the object.


\asubsection{Importing inductive types}
\comindex{ML Import Inductive}

The \Coq\ command to import an ML inductive type is:
$$\mbox{\tt ML Import Inductive \ident\ [\ident$_1$ \dots\ \ident$_n$] == %
{\em <Inductive Definition>}.}$$
where \ident\ is the name of the ML type, \ident$_1$ \dots\
\ident$_n$ the name of its constructors, and {\tt\em<Inductive
  Definition>} the corresponding \Coq\ inductive definition
(see \ref{Inductive} in the Reference Manual for the syntax of
inductive definitions). 

This command inserts the {\tt\em<Inductive Definition>} in the \FW\
environment, without elimination principles. From that moment, it is
possible to use that type like any other \FW\ object, and in
particular to use it to realize axioms. The names \ident\ \ident$_1$
\dots\ \ident$_n$ may be different from the names given in the
inductive definition, in order to avoid clash with previous
constants, and are restored when generating the ML code. 

\noindent One can also import mutual inductive types with the command:
$$\begin{array}{rl}
  \mbox{\tt ML Import Inductive} &
              \mbox{\tt\ident$_1$ [\ident$^1_1$ \dots\ \ident$^1_{n_1}$]} \\
    & \dots \\
    & \mbox{\tt\ident$_k$ [\ident$^k_1$ \dots\ \ident$^k_{n_k}$]} \\
    & \qquad \mbox{\tt== {\em<Mutual Inductive Definition>}.}
  \end{array}$$ %$$

\begin{Examples}
\item Let us show for instance how to import the
  type {\tt bool} of Caml Light booleans:

\begin{coq_example}
ML Import Inductive bool [ true false ] ==
         Inductive BOOL : Set := TRUE  : BOOL
                               | FALSE : BOOL.
\end{coq_example}

Here we changed the names because the type {\tt bool} is already
defined in the initial state of \Coq.

  \item Assuming that one defined the mutual inductive types {\tt
tree} and {\tt forest} in a Caml Light module, one can import them
with the command:

\begin{coq_example}
ML Import Inductive tree [node] forest [empty cons] ==
    Mutual [A:Set] Inductive 
       tree : Set := node : A -> (forest A) -> (tree A)
    with
       forest : Set := empty : (forest A) 
                    | cons : (tree A) -> (forest A) -> (forest A).
\end{coq_example}

  \item One can import the polymorphic type of Caml Light lists with
the command:
\begin{coq_example}
ML Import Inductive list [nil cons] == 
    Inductive list [A:Set] : Set := nil : (list A)
                                 | cons : A->(list A)->(list A).
\end{coq_example}

\Rem One would have to re-define {\tt nil} and {\tt cons} at
the top of its program because these constructors have no name in Caml Light.
\end{Examples}

\asubsection{Importing terms and abstract types}
\comindex{ML Import Constant}

The other command to import an ML object is:
$$\mbox{\tt ML Import Constant \ident$_{ML}$\ == \ident\ : Fwterm.}$$
where \ident$_{ML}$\ is the name of the ML object and Fwterm\ its type in
\FW. This command defines an axiom in \FW\ of name \ident\ and type
Fwterm.

\Example To import the type {\tt int} of Caml Light
integers, and the $<$ binary relation on this type, just do
\begin{coq_example}
ML Import Constant int == int : Set.
ML Import Constant lt_int == lt_int : int -> int -> BOOL.
\end{coq_example}
assuming that the Caml Light type {\tt bool} is already imported (with the
name {\tt BOOL}, as above).


\asubsection{Direct use of ML\ objects}
\comindex{Extract Constant}
\comindex{Extract Inductive}

Sometimes the user do not want to extract \Coq\ objects to new ML code
but wants to use already existing ML objects.  For instance, it is the
case for the booleans, which already exist in ML: the user do not want
to extract the \Coq\ inductive type \texttt{bool} to a new type for
booleans, but wants to use the primitive boolean of ML.

The command \texttt{Extract} fulfills this requirement.
It allows the user to declare constant and inductive types which will not be
extracted but replaced by ML objects. The syntax is the following
$$
\begin{tabular}{l}
  \mbox{\tt Extract Constant \ident\ => \ident'.} \\
  \mbox{\tt Extract Inductive \ident\ 
            => \ident' [ \ident'$_1$ \dots \ident'$_n$ ].} 
\end{tabular}
$$ %$$
where \ident\/ is the name of the \Coq\ object and the prime identifiers 
the name of the corresponding ML objects (the names between brackets
are the names of the constructors).
Mutually recursive types are declared one by one, in any order.

\Example
Typical examples are the following:
\begin{coq_example}
Extract Inductive unit => unit [ "()" ].
Extract Inductive bool => bool [ true false ].
Extract Inductive sumbool => bool [ true false ].
\end{coq_example}


\asubsection{Differences between \Coq\ and ML type systems}

\subsubsection{ML types that are not \FW\ types}

Some ML recursive types have no counterpart in the type system of
\Coq, like types using the record construction, or non positive types
like
\begin{verbatim}
# type T = C of T->T;;
\end{verbatim}
In that case, you cannot import those types as inductive types, and
the only way to do is to import them as abstract types (with {\tt ML
Import}) together with the corresponding building and de-structuring
functions (still with {\tt ML Import Constant}).


\subsubsection{Programs that are not ML-typable}

On the contrary, some extracted programs in \FW\ are not typable in
ML. There are in fact two cases which can be problematic:
\begin{itemize}
  \item If some part of the program is {\em very} polymorphic, there
    may be no ML type for it. In that case the extraction to ML works
    all right but the generated code may be refused by the ML
    type-checker. A very well known example is the {\em distr-pair}
    function:
$$\mbox{\tt
Definition dp := [A,B:Set][x:A][y:B][f:(C:Set)C->C](f A x,f B y).
}$$
In Caml Light, for instance, the extracted term is 
\verb!let dp x y f = pair((f x),(f y))!  and has type
$$\mbox{\tt
dp : 'a -> 'a -> ('a -> 'b) -> ('b,'b) prod
}$$
which is not its original type, but a restriction.

  \item Some definitions of \FW\ may have no counterpart in ML. This
    happens when there is a quantification over types inside the type
    of a constructor; for example:
$$\mbox{\tt
Inductive anything : Set := dummy : (A:Set)A->anything.
}$$
which corresponds to the definition of ML dynamics.
\end{itemize}

The first case is not too problematic: it is still possible to run the
programs by switching off the type-checker during compilation. Unless
you misused the semantical attachment facilities you should never get
any message like ``segmentation fault'' for which the extracted code
would be to blame. To switch off the Caml type-checker, use the
function {\tt obj\_\_magic} which gives the type {\tt 'a} to any
object; but this implies changing a little the extracted code by hand.

The second case is fatal. If some inductive type cannot be translated
to ML, one has to change the proof (or possibly to ``cheat'' by
some low-level manipulations we would not describe here). 

We have to say, though, that in most ``realistic'' programs, these
problems do not occur. For example all the programs of the library are
accepted by Caml type-checker except {\tt Higman.v}\footnote{Should
  you obtain a not ML-typable program out of a self developed example,
  we would be interested in seeing it; so please mail us the example at
  {\em coq@pauillac.inria.fr}}.


\asection{Some examples}

We present here few examples of extractions, taken from the {\tt
theories} library of \Coq\ (into the {\tt PROGRAMS} directory). We
choose Caml Light as target language, but all can be done in the other
dialects with slight modifications.


\asubsection{Euclidean division}

The file {\tt Euclid\_prog} contains the proof of Euclidean division
(theorem {\tt eucl\_dev}). The natural numbers defined in the example
files are unary integers defined by two constructors $O$ and $S$:
\begin{coq_example*}
Inductive nat : Set := O : nat | S : nat -> nat.
\end{coq_example*}

To use the proof, we begin by loading the module {\tt Extraction} and the
file into the \Coq\ environment:

\begin{coq_eval}
Reset Initial.
AddPath "../theories/DEMOS/PROGRAMS".
\end{coq_eval}
\begin{coq_example*}
Require Extraction.
Require Euclid_prog.
\end{coq_example*}

This module contains a theorem {\tt eucl\_dev}, and its extracted term
is of type {\tt (b:nat)(a:nat) (di\-veucl a b)}, where {\tt diveucl} is a
type for the pair of the quotient and the modulo.
We can now extract this program to Caml Light:

\begin{coq_example}
Write CamlLight File "euclid" [ eucl_dev ].
\end{coq_example}

This produces a file {\tt euclid.ml} containing all the necessary
definitions until {\tt let eucl\_dev = ..}. Let us play the resulting program:

\begin{verbatim}
# include "euclid";;
# eucl_dev (S (S O)) (S (S (S (S (S O)))));;
- : diveucl = divex (S (S O), S O)
\end{verbatim}
It is easier to test on Caml Light integers:
\begin{verbatim}
# let rec nat_of = function 0 -> O
                          | n -> S (nat_of (pred n));;
# let rec int_of = function O   -> 0
                          | S p -> succ (int_of p);;
# let div a b = match eucl_dev (nat_of b) (nat_of a) with
                  divex(q,r) -> (int_of q, int_of r);;
div : int -> int -> int * int = <fun>
# div 173 15;;
- : int * int = 11, 8
\end{verbatim}

\asubsection{Heapsort}

Let us see a more complicated example. The file {\tt Heap\_prog.v}
contains the proof of an efficient list sorting algorithm described by
Bjerner. Is is an adaptation of the well-known {\em heapsort}
algorithm to functional languages. We first load the files:

\begin{coq_eval}
Reset Initial.
\end{coq_eval}
\begin{coq_example*}
Require Extraction.
Require Heap_prog.
\end{coq_example*}

As we saw it above we have to instantiate or realize by hand some of
the \Coq\ variables, which are in this case the type of the elements
to sort ({\tt List\_Dom}, defined in {\tt List.v}) and the
decidability of the order relation ({\tt inf\_total}). We proceed as
in section \ref{Extraction}:

\begin{coq_example}
ML Import Constant int == int : Set.
Link List_Dom := int.
ML Import Inductive bool [ true false ] ==
         Inductive BOOL : Set := TRUE  : BOOL
                               | FALSE : BOOL.
ML Import Constant lt_int == lt_int : int->int->BOOL.
Link inf_total :=
         [x,y:int]Cases (lt_int x y) of
                            TRUE => left
                          | FALSE => right
                          end.
\end{coq_example}

Then we extract the Caml Light program

\begin{coq_example}
Write CamlLight File "heapsort" [ heapsort ].
\end{coq_example}
and test it 
\begin{verbatim}

# include "heapsort";;
# let rec listn = function 0 -> nil
                         | n -> cons(random__int 10000,listn (pred n));;
# heapsort (listn 10);;
- : list = cons (136, cons (760, cons (1512, cons (2776, cons (3064, 
cons (4536, cons (5768, cons (7560, cons (8856, cons (8952, nil))))))))))
\end{verbatim}

Some tests on longer lists (100000 elements) show that the program is
quite efficient for Caml code.

\asubsection{Balanced trees}

The file {\tt Avl\_prog.v} contains the proof of insertion in binary
balanced trees (AVL). Here we choose to instantiate such trees on the
type {\tt string} of Caml Light (for instance to get efficient
dictionary); as above we must realize the decidability of the order
relation. It gives the following commands:

\begin{coq_eval}
Reset Initial.
\end{coq_eval}
\begin{coq_example*}
Require Extraction.
Require Avl_prog.
\end{coq_example*}
\begin{coq_eval}
Pwd.
\end{coq_eval}
\begin{coq_example}
ML Import Constant string == string : Set.
ML Import Inductive bool [ true false ] ==
  Inductive BOOL : Set := TRUE  : BOOL
                        | FALSE : BOOL.
ML Import Constant lt_string == lt_string : string->string->BOOL.
Link a := string.
Link inf_dec :=
  [x,y:string]Cases (lt_string x y) of
                TRUE => left
              | FALSE => right
              end.
Write CamlLight File "avl" [rot_d rot_g rot_gd insert].
\end{coq_example}

Notice that we do not want the constants {\tt rot\_d}, {\tt rot\_g}
and {\tt rot\_gd} to be expanded in the function {\tt insert}, and
that is why we added them in the list of required functions. It makes
the resulting program clearer, even if it becomes less efficient.

Let us insert random words in an initially empty tree to check that it
remains balanced:
\begin{verbatim}
% camllight
# include "avl";;
# let add a t = match insert a t with
                 h_eq x -> x
               | h_plus x -> x ;;
# let rdmw () = let s = create_string 5 in
                for i = 0 to 4 do
                  set_nth_char s i (char_of_int (97+random__int 26))
                done ; s ;;
# let rec built = function 0 -> nil
                         | n -> add (rdmw()) (built (pred n));;
# built 10;;
- : abe = node ("ogccy", node ("gmygy", node ("cwqug", node ("cjyrc", nil, ...


# let rec size = function
    nil -> 0
  | node(_,t1,t2,_) -> 1+(max (size t1) (size t2)) ;;
# let rec check = function
    nil -> true
  | node(_,a1,a2,_) -> 
         let t1 = size a1 and t2 =size a2 in
         if abs(t1-t2)>1 then false else (check a1) & (check a2) ;;

# check (built 100);;
- : bool = true
\end{verbatim}


\section{Bugs}

Surely there are still bugs in the {\tt Extraction} module.
You can send your bug reports directly to the author
(at \textsf{Jean-Christophe.Filliatre$@$lri.fr}) or to the \Coq\
mailing list (at \textsf{coq$@$pauillac.inria.fr}).

% $Id$