\chapter{Syntax extensions and interpretation scopes} \label{Addoc-syntax} In this chapter, we introduce advanced commands to modify the way {\Coq} parses and prints objects, i.e. the translations between the concrete and internal representations of terms and commands. The main commands are {\tt Notation} and {\tt Infix} which are described in section \ref{Notation}. It also happens that the same symbolic notation is expected in different contexts. To achieve this form of overloading, {\Coq} offers a notion of interpretation scope. This is described in section \ref{scopes}. \Rem The commands {\tt Grammar}, {\tt Syntax} and {\tt Distfix} which were present for a while in {\Coq} are no longer available from {\Coq} version 8.0. The underlying AST structure is also no longer available. The functionalities of the command {\tt Syntactic Definition} are still available, see section \ref{Abbreviations}. \section{Notations} \label{Notation} \comindex{Notation} \subsection{Basic notations} A {\em notation} is a symbolic abbreviation denoting some term or term pattern. A typical notation is the use of the infix symbol \verb=/\= to denote the logical conjunction (\texttt{and}). Such a notation is declared by \begin{coq_example*} Notation "A /\ B" := (and A B). \end{coq_example*} The expression \texttt{(and A B)} is the abbreviated term and the string \verb="A /\ B"= (called a {\em notation}) tells how it is symbolically written. A notation is always surrounded by double quotes (excepted when the abbreviation is a single ident, see \ref{Abbreviations}). The notation is composed of {\em tokens} separated by spaces. Identifiers in the string (such as \texttt{A} and \texttt{B}) are the {\em parameters} of the notation. They must occur at least once each in the denoted term. The other elements of the string (such as \verb=/\=) are the {\em symbols}. An identifier can be used as a symbol but it must be surrounded by simple quotes to avoid the confusion with a parameter. Similarly, every symbol of at least 3 characters and starting with a simple quote must be quoted (then it starts by two single quotes). Here is an example. \begin{coq_example*} Notation "'IF' c1 'then' c2 'else' c3" := (IF_then_else c1 c2 c3). \end{coq_example*} %TODO quote the identifier when not in front, not a keyword, as in "x 'U' y" ? \subsection{Precedences and associativity} \index{Precedences} \index{Associativity} Mixing different symbolic notations in a same text may cause serious parsing ambiguity. To deal with the ambiguity of notations, {\Coq} uses precedence levels ranging from 0 to 100 (plus two extra levels numbered 200 and 250) and associativity rules. Consider for example the new notation \begin{coq_example*} Notation "A \/ B" := (or A B). \end{coq_example*} Clearly, an expression such as {\tt (A:Prop)True \verb=/\= A \verb=\/= A \verb=\/= False} is ambiguous. To tell the {\Coq} parser how to interpret the expression, a priority between the symbols \verb=/\= and \verb=\/= has to be given. Assume for instance that we want conjunction to bind more than disjunction. This is expressed by assigning a precedence level to each notation, knowing that a lower level binds more than a higher level. Hence the level for disjunction must be higher than the level for conjunction. Since connectives are the less tight articulation points of a text, it is reasonable to choose levels not so far from the higher level which is 100, for example 85 for disjunction and 80 for conjunction\footnote{which are the levels effectively chosen in the current implementation of {\Coq}}. Similarly, an associativity is needed to decide whether {\tt True \verb=/\= False \verb=/\= False} defaults to {\tt True \verb=/\= (False \verb=/\= False)} (right associativity) or to {\tt (True \verb=/\= False) \verb=/\= False} (left associativity). We may even consider that the expression is not well-formed and that parentheses are mandatory (this is a ``no associativity'')\footnote{ {\Coq} accepts notations declared as no associative but the parser on which {\Coq} is built, namely {\camlpppp}, currently does not implement the no-associativity and replace it by a left associativity; hence it is the same for {\Coq}: no-associativity is in fact left associativity}. We don't know of a special convention of the associativity of disjunction and conjunction, let's apply for instance a right associativity (which is the choice of {\Coq}). Precedence levels and associativity rules of notations have to be given between parentheses in a list of modifiers that the \texttt{Notation} command understands. Here is how the previous examples refine. \begin{coq_example*} Notation "A /\ B" (and A B) (at level 80, right associativity). Notation "A \/ B" (or A B) (at level 85, right associativity). \end{coq_example*} By default, a notation is considered non associative, but the precedence level is mandatory (except for special cases whose level is canonical). The level is either a number or the mention {\tt next level} whose meaning is obvious. The list of levels already assigned is on figure~\ref{coq-symbols}. Notice that notations at level 250 are necessarily parsed surrounded by parentheses. \begin{figure} \label{coq-symbols} \begin{center} \begin{tabular}{|l|l|} \hline notation & precedence/associativity \\ \hline \verb$"_ , _"$ & 250 \\ \verb$exists _ : _ | _$ & 200 \\ \verb$exists _ | _$ & 200 \\ \verb$exists2 _ : _ | _ & _$ & 200 \\ \verb$exists2 _ | _ & _$ & 200 \\ \verb$"_ <-> _"$ & 95 \\ \verb$"_ -> _"$ & 90\, R (primitive) \\ \verb$"_ \/ _"$ & 85\, R \\ \verb$"_ /\ _"$ & 80\, R \\ \verb$"~ _"$ & 75\, R \\ \verb$"_ = _"$ & 70 \\ \verb$"_ = _ :> _"$ & 70 \\ \verb$"_ = _ = _"$ & 70 \\ \verb$"_ <> _"$ & 70 \\ \verb$"_ <> _ :> _"$ & 70 \\ \verb$"_ <= _"$ & 70 \\ \verb$"_ < _"$ & 70 \\ \verb$"_ > _"$ & 70 \\ \verb$"_ >= _"$ & 70 \\ \verb$"_ <= _ <= _"$ & 70 \\ \verb$"_ <= _ < _"$ & 70 \\ \verb$"_ < _ <= _"$ & 70 \\ \verb$"_ < _ < _"$ & 70 \\ \verb$"_ + _"$ & 50\, L \\ \verb$"_ - _"$ & 50\, L \\ \verb$"_ * _"$ & 40\, L \\ \verb$"_ / _"$ & 40\, L \\ \verb$"- _"$ & 35\, R \\ \verb$"/ _"$ & 35\, R \\ \verb$"_ ^ _"$ & 30\, L \\ \verb$"{ _ } + { _ }"$ & 0 \\ \verb$"_ + { _ }"$ & 50\, L \\ \verb$"{ _ : _ | _ }"$ & 0 \\ \verb$"{ _ : _ | _ & _ }"$ & 0 \\ \verb$"{ _ : _ & _ }"$ & 0 \\ \verb$"{ _ : _ & _ & _ }"$ & 0 \\ \hline \end{tabular} \end{center} \caption{Notations defined in the initial state of {\Coq}} \end{figure} \subsection{Complex notations} Notations can be made from arbitraly complex symbols. One can for instance define prefix notations. \begin{coq_example*} Notation "~ x" := (not x) (at level 75, right associativity). \end{coq_example*} One can also define notations for incomplete terms, with the hole expected to be inferred at typing time. \begin{coq_example*} Notation "x = y" := (@eq ? x y) (at level 70, no associativity). \end{coq_example*} One can define {\em closed} notations whose both sides are symbols. In this case, the default precedence level for inner subexpression is 200. \begin{coq_example*} Notation "{ A } + { B }" := (sumbool A B) (at level 0). \end{coq_example*} One can also define notations for binders. \begin{coq_example*} Notation "{ x : A | P }" := (sig A (fun x => P)) (at level 0). \end{coq_example*} \subsection{Simple factorisation rules} {\Coq} extensible parsing is performed by Camlp4 which is essentially a LL1 parser. Hence, some care has to be taken not to hide already existing rules by new rules. Some simple left factorisation work has to be done. Here is an example. \begin{coq_example*} Notation "x < y" := (lt x y) (at level 70). Notation "x < y < z" := (x < y /\ y < z) (at level 70). \end{coq_example*} In order to factorise the left part of the rules, the subexpression referred by {\tt y} has to be at the same level in both rules. However the default behaviour puts {\tt y} at the next level below 70 in the first rule (no associativity is the default), and at the level 200 in the second rule (level 200 is the default for inner expressions). To fix this, we need to force the parsing level of {\tt y}, as follows. \begin{coq_example*} Notation "x < y" := (lt x y) (at level 70). Notation "x < y < z" := (x < y /\ y < z) (at level 70, y at next level). \end{coq_example*} For the sake of factorisation with {\Coq} predefined rules, simple rules have to be observed for notations starting with a symbol: e.g. rules starting with ``\{'' or ``('' should be put at level 0. The list of {\Coq} predefined notations is given on figure~\ref{coq-symbols}. \subsection{Displaying symbolic notations} The command \texttt{Notation} has an effect both on the {\Coq} parser and on the {\Coq} printer. For example: \begin{coq_example} Check (and True True). \end{coq_example} However, printing, especially pretty-printing, requires more care than parsing. We may want specific indentations, line breaks, alignment if on several lines, etc. The default printing of notations is very rudimentary. For printing a notation, a {\em formatting box} is opened in such a way that if the notation and its arguments cannot fit on a single line, a line break is inserted before the symbols of the notation and the arguments on the next lines are aligned with the argument on the first line. A first, simple control that a user can have on the printing of a notation is the insertion of spaces at some places of the notation. This is performed by adding extra spaces between the symbols and parameters: each extra space (other than the single space needed to separate the components) is interpreted as a space to be inserted by the printer. Here is an example showing how to add spaces around the bar of the notation. \begin{coq_example} Notation "{{ x : A | P }}" := (sig (fun x : A => P)) (at level 0, x at level 99). Check (sig (fun x : nat => x=x)). \end{coq_example} The second, more powerful control on printing is by using the {\tt format} modifier. Here is an example \begin{small} \begin{coq_example} Notation "'If' c1 'then' c2 'else' c3" := (IF_then_else c1 c2 c3) (at level 200, right associativity, format "'[v ' 'If' c1 '/' '[' 'then' c2 ']' '/' '[' 'else' c3 ']' ']'"). \end{coq_example} \end{small} A {\em format} is an extension of the string denoting the notation with the possible following elements delimited by single quotes: \begin{itemize} \item extra spaces are translated into simple spaces \item tokens of the form \verb='/ '= are translated into breaking point, in case a line break occurs, an indentation of the number of spaces after the ``\verb=/='' is applied (2 spaces in the given example) \item token of the form \verb='//'= force writing on a new line \item well-bracketed pairs of tokens of the form \verb='[ '= and \verb=']'= are translated into printing boxes; in case a line break occurs, an extra indentation of the number of spaces given after the ``\verb=[='' is applied (4 spaces in the example) \item well-bracketed pairs of tokens of the form \verb='[hv '= and \verb=']'= are translated into horizontal-orelse-vertical printing boxes; if the content of the box does not fit on a single line, then every breaking point forces a newline and an extra indentation of the number of spaces given after the ``\verb=[='' is applied at the beginning of each newline (3 spaces in the example) \item well-bracketed pairs of tokens of the form \verb='[v '= and \verb=']'= are translated into vertical printing boxes; every breaking point forces a newline, even if the line is large enough to display the whole content of the box, and an extra indentation of the number of spaces given after the ``\verb=[='' is applied at the beginning of each newline \end{itemize} Thus, for the previous example, we get %\footnote{The ``@'' is here to shunt %the notation "'IF' A 'then' B 'else' C" which is defined in {\Coq} %initial state}: Notations do not survive the end of sections. No typing of the denoted expression is performed at definition time. Type-checking is done only at the time of use of the notation. \begin{coq_example} Check (IF_then_else (IF_then_else True False True) (IF_then_else True False True) (IF_then_else True False True)). \end{coq_example} \Rem Sometimes, a notation is expected only for the parser. %(e.g. because %the underlying parser of {\Coq}, namely {\camlpppp}, is LL1 and some extra %rules are needed to circumvent the absence of factorisation). To do so, the option {\em only parsing} is allowed in the list of modifiers of \texttt{Notation}. \subsection{The \texttt{Infix} command} The \texttt{Infix} command is a shortening for declaring notations of infix symbols. Its syntax is \medskip \noindent\texttt{Infix} "{\em symbolentry}" {\qualid} {\tt (} \nelist{\em modifier}{,} {\tt )}. \medskip and it is equivalent to \medskip \noindent\texttt{Notation "x {\em symbolentry} y" := {\qualid} x y ( \nelist{\em modifier}{,} )}. \medskip where {\tt x} and {\tt y} are fresh names distinct from {\qualid}. Here is an example. \begin{coq_example*} Infix "/\\" and (at level 6, right associativity). \end{coq_example*} \subsection{Reserving notations} A given notation may be used in different contexts. {\Coq} expects all uses of the notation to be defined at the same precedence and with the same associativity. To avoid giving the precedence and associativity every time, it is possible to declare a parsing rule in advance without giving its interpretation. Here is an example from the initial state of {\Coq}. \begin{coq_example} Reserved Notation "x = y" (at level 70, no associativity). \end{coq_example} Reserving a notation is also useful for simultaneously defined an inductive type or a recursive constant and a notation for it. \Rem The notations mentionned on table~\ref{coq-symbols} are reserved. Hence their precedence and associativity cannot be changed. \subsection{Simultaneous definition of terms and notations} \subsection{Displaying informations about notations} % Set/Unset Printing Notation \subsection{Locating notations} \comindex{Locate} \label{LocateSymbol} To know to which notations a given symbol belongs to, use the command \bigskip {\tt Locate {\symbolentry}} \bigskip where symbol is any (composite) symbol surrounded by quotes. To locate a particular notation, use a string where the variables of the notation are replaced by ``\_''. \Example \begin{coq_example} Locate "exists". Locate "'exists' _ | _". \end{coq_example} \SeeAlso Section \ref{Locate}. \section{Interpretation scopes} \label{scopes} % Introduction \subsection{Notations in scope} \subsection{Activation of interpretation scopes} % Open (Local) Scope % Close (Local) Scope \subsection{Interpretation of numerals} \subsection{Interpretation scopes of arguments} \subsection{Interpretation scopes used in the standard library of {\Coq}} % nat_scope % N_scope % Z_scope % positive_scope % bool_scope % list_scope \subsection{Displaying informations about scopes} % Print Visibility name_opt % Print Scope name % Print Scopes \section{Abbreviations} \index{Abbreviations} \label{Abbreviations} An {\em abbreviation} is a name denoting a (presumably) more complex expression. An abbreviation is a special form of notation with no parameter and only one symbol which is an identifier. This identifier is given with no quotes around. Example: \begin{coq_example*} Notation List := (list nat). \end{coq_example*} An abbreviation expects no precedence nor associativity, since it can always be put at the lower level of atomic expressions, and associativity is irrelevant. Abbreviations are used as much as possible by the {\Coq} printers unless the modifier \verb=(only parsing)= is given. Abbreviations are bound to an absolute name like for an ordinary definition, and can be referred by partially qualified names too. Abbreviations are syntactic in the sense that they are bound to expressions which are not typed at the time of the definition of the abbreviation but at the time it is used. Especially, abbreviation can be bound to terms with holes (i.e. with ``\_''). \Example \begin{coq_eval} Set Strict Implicit. Reset Initial. \end{coq_eval} \begin{coq_example} Definition explicit_id (A:Set) (a:A) := a. Notation id := (explicit_id _). Check (id 0). \end{coq_example} Abbreviations do not survive the end of sections. No typing of the denoted expression is performed at definition time. Type-checking is done only at the time of use of the abbreviation. \Rem \index{Syntactic Definition} % For compatibility Abbreviations are similar to the {\em syntactic definitions} available in versions of {\Coq} prior to version 8.0, except that abbreviations are used for printing (unless the modifier \verb=(only parsing)= is given) while syntactic definitions were not. \section{Summary} \paragraph{Persistence of notations} Notations do not survive the end of sections. They survive modules unless the command {\tt Notation Local} is used instead of {\tt Notation}. \paragraph{Syntax of notations} The different syntactic variants of the command \texttt{Notation} are the following \begin{itemize} \item \texttt{Notation} {\str} \texttt{:=} {\term}. \item \texttt{Notation} {\ident} \texttt{:=} {\term}. \item \texttt{Notation} {\str} \texttt{:=} {\term} {\tt (} \nelist{\em modifier}{,} {\tt )}. \end{itemize} where a {\em modifier} is one of \texttt{at level $n$}, \texttt{ at next level}, \texttt{ no associativity}, \texttt{ right associativity}, \texttt{ left associativity}, \texttt{ \nelist{\ident}{,} at level $n$}, \texttt{ \nelist{\ident}{,} at next level}, \texttt{ only parsing} or \texttt{ format {\str}}. \Rem No typing of the denoted expression is performed at definition time. Type-checking is done only at the time of use of the notation. \Rem Many examples of {\tt Notation} may be found in the files composing the initial state of {\Coq} (see directory {\tt \$COQLIB/theories/Init}). %\section{Symbolic interpretation scopes} % %TODO \iffalse %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \chapter{syntax extensions} \label{Addoc-syntax} In this chapter, we introduce advanced commands to modify the way {\Coq} parses and prints objects, i.e. the translations between the concrete and internal representations of terms and commands. As in most compilers, there is an intermediate structure called {\em Abstract Syntax Tree} (AST). Parsing a term is done in two steps\footnote{ We omit the lexing step, which simply translates a character stream into a token stream. If this translation fails, this is a \emph{Lexical error}. }: \begin{enumerate} \item An AST is build from the input (a stream of tokens), following grammar rules. This step consists in deciding whether the input belongs to the language or not. If it is, the parser transforms the expression into an AST. If not, this is a syntax error. An expression belongs to the language if there exists a sequence of grammar rules that recognizes it. This task is delegated to {\camlpppp}. See the Reference Manual~\cite{ddr98} for details on the parsing technology. The transformation to AST is performed by executing successively the {\sl actions} bound to these rules. \item The AST is translated into the internal representation of commands and terms. At this point, we detect unbound variables and determine the exact section-path of every global value. Then, the term may be typed, computed,~\ldots \end{enumerate} The printing process is the reverse: commands or terms are first translated into AST's, and then the pretty-printer translates this AST into a printing orders stream, according to printing rules. In {\Coq}, only the translations between AST's and the concrete representation are extendable. One can modify the set of grammar and printing rules, but one cannot change the way AST's are interpreted in the internal level. In the following section, we describe the syntax of AST expressions, involved in both parsing and printing. The next two sections deal with extendable grammars and pretty-printers. \section{Abstract syntax trees (AST)} \index{AST} \index{Abstract syntax tree} \label{astsyntax} The AST expressions are conceptually divided into two classes: \emph{constructive expressions} (those that can be used in parsing rules) and \emph{destructive expressions} (those that can be used in pretty printing rules). In the following we give the concrete syntax of expressions and some examples of their usage. The BNF grammar {\sl ast} in Fig.~\ref{astexpr} defines the syntax of both constructive and destructive expressions. The lexical conventions are the same as in section~\ref{lexical}. Let us first describe the features common to constructive and destructive expressions. \begin{figure} \begin{center} \begin{tabular}{|rclr|} \hline {\sl ast} & ::= & {\sl meta} & (metavariable) \\ & $|$ & {\ident} & (variable)\\ & $|$ & {\integer} & (positive integer)\\ & $|$ & {\sl string} & (quoted string) \\ & $|$ & {\sl path} & (section-path) \\ & $|$ & \verb+{+ {\ident} \verb+}+ & (identifier) \\ & $|$ & \verb+[+ {\sl name} \verb+]+ {\sl ast} & (abstraction) \\ & $|$ & \verb+(+ {\ident}~~\sequence{{\sl ast}}{} \verb+)+ & (application node)\\ & $|$ & \verb+(+ {\sl special-tok}~~{\sl meta} \verb+)+ & (special-operator) \\ & $|$ & \verb+'+ {\sl ast} & (quoted ast) \\ & $|$ & {\sl quotation} & \\ {\sl meta} & ::= & \verb+$+{\ident} & \\ %$ {\sl path} & ::= & \nelist{{\tt\#}{\ident}}{} \verb+.+ {\sl kind} & \\ {\sl kind} & ::= & \verb+cci+ $~|~$\verb+fw+ $~|~$ \verb+obj+ & \\ {\sl name} & ::= & \verb+<>+ ~$|$~ {\ident} ~$\mid$~ {\sl meta} & \\ {\sl special-tok} & ::= & \verb+$LIST+~$|$~\verb+$VAR+~$|$~\verb+$NUM+~$|$~% \verb+$STR+~$|$~\verb+$PATH+~$|$~\verb+$ID+ & \\ {\sl quotation}& ::= & \verb+<<+ ~{\sl meta-constr}~ \verb+>>+ & \\ & $|$ & \verb+<:constr:<+ ~{\sl meta-constr}~ \verb+>>+ & \\ & $|$ & \verb+<:vernac:<+ ~{\sl meta-vernac}~ \verb+>>+ & \\ & $|$ & \verb+<:tactic:<+ ~{\sl meta-tactic}~ \verb+>>+ & \\ \hline \end{tabular} \end{center} \caption{Syntax of AST expressions}\label{astexpr} \end{figure} \subsubsection{Atomic AST} An atomic AST can be either a variable, a natural number, a quoted string, a section path or an identifier. They are the basic components of an AST. \subsubsection{Metavariable} \index{Metavariable} Metavariables are used to perform substitutions in constructive expressions: they are replaced by their value in a given environment. They are also involved in the pattern matching operation: metavariables in destructive patterns create new bindings in the environment. As we will see later, metavariables may denote an AST or an AST list (when used with the \verb+$LIST+ special token). %$ So, we introduce two types of variables: \verb+ast+ and \verb+ast list+. The type of variables is checked statically: an expression referring to undefined metavariables, or using metavariables with an inappropriate type, will be rejected. \subsubsection{Application node} Note that the AST syntax is rather general, since application nodes may be labelled by an arbitrary identifier (but not a metavariable), and operators have no fixed arity. This enables the extensibility of the system. Nevertheless there are some application nodes that have some special meaning for the system. They are build on \verb+(+{\sl special-tok}~{\sl meta}\verb+)+, and cannot be confused with regular nodes since {\sl special-tok} begins with a \verb+$+. %$ There is a description of these nodes below. \subsubsection{Abstraction} The equality on AST's is the $\alpha$-conversion, i.e. two AST's are equal if only they are the same up to renaming of bound variables (thus, \verb+[x]x+ is equal to \verb+[y]y+). This makes the difference between variables and identifiers clear: the former may be bound by abstractions, whereas identifiers cannot be bound. To illustrate this, \verb+[x]x+ and \verb+[y]y+ are equal and \verb+[x]{x}+ is equal to \verb+[y]{x}+, but not to \verb+[y]{y}+. The binding structure of AST is used to represent the binders in the terms of {\Coq}: the product \verb+(x:$A)$B+ is mapped to the AST \verb+(PROD $A [x]$B)+, whereas the non dependent product \verb+$A->$B+ is mapped to \verb+(PROD $A [<>]$B)+ (\verb+[<>]t+ is an anonymous abstraction). Metavariables can appear in abstractions. In that case, the value of the metavariable must be a variable (or a list of variables). If not, a run-time error is raised. \subsubsection{Quoted AST} \index{Quoted AST} The \verb+'+$t$ construction means that the AST $t$ should not be interpreted at all. The main effect is that metavariables occurring in it cannot be substituted or considered as binding in patterns. \subsubsection{Quotations} \index{Quotations} The non terminal symbols {\sl meta-constr}, {\sl meta-vernac} and {\sl meta-tactic} stand, respectively, for the syntax of CIC terms, vernacular phrases and tactics. The prefix {\sl meta-} is just to emphasize that the expression may refer to metavariables. Indeed, if the AST to generate corresponds to a term that already has a syntax, one can call a grammar to parse it and to return the AST result. For instance, \verb+<<(eq ? $f $g)>>+ denotes the AST which is the application (in the sense of CIC) of the constant {\tt eq} to three arguments. It is coded as an AST node labelled {\tt APPLIST} with four arguments. This term is parsable by \verb+constr:constr+ grammar. This grammar is invoked on this term to generate an AST by putting the term between ``\verb+<<+'' and ``\verb+>>+''. We can also invoke the initial grammars of several other predefined entries (see section~\ref{predefined-grammars} for a description of these grammars). \begin{itemize} \item \verb|<:constr:< t >>| parses {\tt t} with {\tt constr:constr} grammar(terms of CIC). \item \verb|<:vernac:< t >>| parses {\tt t} with {\tt vernac:vernac} grammar (vernacular commands). \item \verb|<:tactic:< t >>| parses {\tt t} with {\tt tactic:tactic} grammar (tactic expressions). \item \verb|<< t >>| parses {\tt t} with the default quotation (that is, {\tt constr:constr}). It is the same as \verb|<:constr:< t >>|. \end{itemize} \Warning One cannot invoke other grammars than those described. \subsubsection{Special operators in constructive expressions} The expressions \verb+($LIST $x)+ injects the AST list variable {\tt\$x} in an AST position. For example, an application node is composed of an identifier followed by a list of AST's that are glued together. Each of these expressions must denote an AST. If we want to insert an AST list, one has to use the \verb+$LIST+ operator. Assume the variable \verb+$idl+ is bound to the list \verb+[x y z]+, the expression \verb+(Intros ($LIST $idl) a b c)+ will build the AST \verb+(Intros x y z a b c)+. Note that \verb+$LIST+ does not occur in the result. %$ % Ameliorer le style! Since we know the type of variables, the \verb+$LIST+ %$ is not really necessary. We enforce this annotation to stress on the fact that the variable will be substituted by an arbitrary number of AST's. The other special operators ({\tt\$VAR}, {\tt\$NUM}, {\tt\$STR}, {\tt\$PATH} and {\tt\$ID}) are forbidden. \subsubsection{Special operators in destructive expressions (AST patterns)} \label{patternsyntax} \index{AST patterns} % pas encore implante (serait utile pour afficher les LAMBDALIST et % PRODLIST). %\item \verb+($SLAM $l body)+ is used to denote an abstraction where % the elements of the list variable \verb+$l+ are the variables %%$ % simultaneously abstracted in \verb+body+. % The production to recognize a lambda abstraction of the form % $[x_1,\ldots,x_n:T]body$ use the operator \verb+$SLAM+: A pattern is an AST expression, in which some metavariables can appear. In a given environment a pattern matches any AST which is equal (w.r.t $\alpha$-conversion) to the value of the pattern in an extension of the current environment. The result of the matching is precisely this extended environment. This definition allows non-linear patterns (i.e. patterns in which a variable occurs several times). For instance, the pattern \verb+(PAIR $x $x)+ matches any AST which is a node labelled {\tt PAIR} applied to two identical arguments, and binds this argument to {\tt\$x}. If {\tt\$x} was already bound, the arguments must also be equal to the current value of {\tt\$x}. The ``wildcard pattern'' \verb+$_+ is not a regular metavariable: it matches any term, but does not bind any variable. The pattern \verb+(PAIR $_ $_)+ matches any {\tt PAIR} node applied to two arguments. The {\tt\$LIST} operator still introduces list variables. Typically, when a metavariable appears as argument of an application, one has to say if it must match \emph{one} argument (binding an AST variable), or \emph{all} the arguments (binding a list variable). Let us consider the patterns \verb+(Intros $id)+ and \verb+(Intros ($LIST $idl))+. The former matches nodes with \emph{exactly} one argument, which is bound in the AST variable {\tt\$id}. On the other hand, the latter pattern matches any AST node labelled {\tt Intros}, and it binds the \emph{list} of its arguments to the list variable {\tt\$idl}. The {\tt\$LIST} pattern must be the last item of a list pattern, because it would make the pattern matching operation more complicated and less efficient. The pattern \verb+(Intros ($LIST $idl) $lastid)+ is not accepted. The other special operators allows checking what kind of leaf we are destructing: \begin{itemize} \item{\tt\$VAR} matches only variables \item{\tt\$NUM} matches natural numbers \item{\tt\$STR} matches quoted strings \item{\tt\$PATH} matches section-paths \item{\tt\$ID} matches identifiers \end{itemize} \noindent For instance, the pattern \verb+(DO ($NUM $n) $tc)+ matches \verb+(DO 5 (Intro))+, and creates the bindings ({\tt\$n},5) and ({\tt\$tc},\verb+(Intro)+). The pattern matching would fail on \verb+(DO "5" (Intro))+. \section{Extendable grammars} \label{Grammar} \index{Extendable Grammars} \comindex{Grammar} Grammar rules can be added with the {\tt Grammar} command. This command is just an interface towards {\camlpppp}, providing the semantic actions so that they build the expected AST. A simple grammar command has the following syntax: \index{Grammar@{\tt Grammar}} \begin{center} \texttt{Grammar}~\textsl{entry}~\textsl{nonterminal}~\texttt{:=}~% \textsl{rule-name}~\textsl{LMP}~\verb+->+~\textsl{action}~\texttt{.} \end{center} The components have the following meaning: \begin{itemize} \item a grammar name: defined by a parser entry and a non-terminal. Non-terminals are packed in an \emph{entry} (also called universe). One can have two non-terminals of the same name if they are in different entries. A non-terminal can have the same name as its entry. \item a rule (sometimes called production), formed by a name, a left member of production and an action, which generalizes constructive expressions. \end{itemize} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{figure} \begin{center} \begin{tabular}{|rcl|} \hline {\sl grammar} & ::= & \verb+Grammar+~{\sl entry}~\nelist{{\sl gram-entry}}{with} \\ {\sl entry}& ::= & {\ident} \\ {\sl gram-entry} & ::= & {\sl rule-name}~\zeroone{{\tt :}~{\sl entry-type}}~\verb+:=+~% % [{\sl assoc}]~ \sequence{{\sl production}}{|} \\ {\sl entry-type} & ::= & \verb+ast+~$|$~\verb+ast list+~$|$~\verb+constr+ ~$|$~\verb+tactic+~$|$~\verb+vernac+ \\ %{\sl assoc} & ::= & \verb+LEFTA+~$|$~\verb+RIGHTA+~$|$~\verb+NONA+ \\ % parler de l'associativite ? % Cela n'a pas vraiment de sens si toutes les re`gles sont dans un % meme niveau pour Camlp4. {\sl production} & ::= & {\sl rule-name}~\verb+[+~\sequence{{\sl prod-item}}{}~\verb+] ->+ ~{\sl action}\\ {\sl rule-name} & ::= & {\sl ident} \\ {\sl prod-item} & ::= & {\sl string} \\ & | & \zeroone{{\sl entry}~{\tt :}}~{\sl entry-name}~% \zeroone{{\tt (}~{\sl meta}~{\tt )}} \\ {\sl action} & ::= & \verb+[+~\sequence{{\sl ast-quote}}{}~\verb+]+ \\ & | & \verb+let+~{\sl pattern}~\verb+=+~{\sl action}~% \verb+in+~{\sl action} \\ & | & {\tt case}~{\sl action}~\zeroone{{\tt :}~{\sl entry-type}}~{\tt of}~% \sequence{{\sl case}}{|}~{\tt esac} \\ {\sl case} & ::= & \sequence{{\sl pattern}}{}~\verb+->+~{\sl action} \\ {\sl pattern} & ::= & {\sl ast} \\ \hline \end{tabular} \end{center} \caption{Syntax of the grammar extension command}\label{grammarsyntax} \end{figure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% The exact syntax of the {\tt Grammar} command is defined in Fig.~\ref{grammarsyntax} where non terminal {\sl ast-quote} is one of {\tt ast}, {\tt constr}, {\tt tactic} or {\tt vernac}, depending on the entry type. It is possible to extend a grammar with several rules at once. $$ \begin{array}{rcc} \verb+Grammar+~~entry~~nonterminal & \verb+:=+ & production_1 \\ & \verb+|+ & \vdots \\ & \verb+|+ & production_n \verb+.+ \\ \end{array} $$ Productions are entered in reverse order (i.e. $production_n$ before $production_1$), so that the first rules have priority over the last ones. The set of rules can be read as an usual pattern matching. \noindent Also, we can extend several grammars of a given universe at the same time. The order of non-terminals does not matter since they extend different grammars. $$ \begin{array}{rcccc} \verb+Grammar+ & entry & nonterminal_1 & \verb+:=+ & production_1^1 \\ && & \verb+|+ & \vdots \\ && & \verb+|+ & production_{n_1}^1 \\ & \verb+with+ & \vdots && \\ & \verb+with+ & nonterminal_p & \verb+:=+ & production_1^p \\ && & \verb+|+ & \vdots \\ && & \verb+|+ & production_{n_p}^p \verb+.+ \\ \end{array} $$ \subsection{Grammar entries} \index{Grammar entries} \label{predefined-grammars} Let us describe the four predefined entries. Each of them (except {\tt prim}) possesses an initial grammar for starting the parsing process. \begin{itemize} \item \verb+prim+ : it is the entry of the primitive grammars. Most of them cannot be defined by the extendable grammar mechanism. They are encoded inside the system. The entry contains the following non-terminals: \begin{itemize} \item \verb+var+ : variable grammar. Parses an identifier and builds an AST which is a variable. \item \verb+ident+ : identifier grammar. Parses an identifier and builds an AST which is an identifier such as \verb+{x}+. \item \verb+number+ : number grammar. Parses a positive integer. \item \verb+string+ : string grammar. Parses a quoted string. \item \verb+path+ : section path grammar. \item \verb+ast+ : AST grammar. \item \verb+astpat+ : AST pattern grammar. \item \verb+astact+ : action grammar. \end{itemize} The primitive grammars are used as the other grammars; for instance the variables of terms are parsed by \verb+prim:var($id)+. \item \verb+constr+ : it is the term entry. It allows to have a pretty syntax for terms. Its initial grammar is {\tt constr:constr}. This entry contains several non-terminals, among them {\tt constr0} to {\tt constr10} which stratify the terms according to priority levels (\verb+0+ to \verb+10+). These priority levels allow us also to specify the order of associativity of operators. % Ce serait bien si on etait capables de donner une table avec les % niveaux de priorite de chaque operateur... \item \verb+vernac+ : it is the vernacular command entry, with {\tt vernac vernac} as initial grammar. Thanks to it, the developers can define the syntax of new commands they add to the system. As to users, they can change the syntax of the predefined vernacular commands. \item \verb+tactic+ : it is the tactic entry with {\tt tactics:tactic} as initial grammar. This entry allows to define the syntax of new tactics or redefine the syntax of existing tactics. % See chapter~\ref{WritingTactics} about user-defined tactics % for more details. \end{itemize} The user can define new entries and new non-terminals, using the grammar extension command. A grammar does not have to be explicitly defined. But the grammars in the left member of rules must all be defined, possibly by the current grammar command. It may be convenient to define an empty grammar, just so that it may be called by other grammars, and extend this empty grammar later. Assume that the {\tt constr:constr13} does not exist. The next command defines it with zero productions. \begin{coq_example*} Grammar constr constr13 := . \end{coq_example*} The grammars of new entries do not have an initial grammar. To use them, they must be called (directly or indirectly) by grammars of predefined entries. We give an example of a (direct) call of the grammar {\tt newentry:nonterm} by {\tt constr:constr}. This following rule allows to use the syntax \verb+a&b+ for the conjunction \verb+a/\b+. Note that since we extend a rule of universe \verb+constr+, the command quotation is used on the right-hand side of the second rule. \begin{coq_example} Grammar newentry nonterm := ampersand [ "&" constr:constr($c) ] -> [$c]. Grammar constr constr := new_and [ constr8($a) newentry:nonterm($b) ] -> [$a/\$b]. \end{coq_example} \subsection{Left member of productions (LMP)} A LMP is composed of a combination of terminals (enclosed between double quotes) and grammar calls specifying the entry. It is enclosed between ``\verb+[+'' and ``\verb+]+''. The empty LMP, represented by \verb+[ ]+, corresponds to $\epsilon$ in formal language theory. A grammar call is done by \verb+entry:nonterminal($id)+ where: \begin{itemize} \item \verb+entry+ and \verb+nonterminal+ specifies the entry of the grammar, and the non-terminal. \item \verb+$id+ is a metavariable that will receive the AST or AST list resulting from the call to the grammar. \end{itemize} The elements \verb+entry+ and \verb+$id+ are optional. The grammar entry can be omitted if it is the same as the entry of the non-terminal we are extending. Also, \verb+$id+ is omitted if we do not want to get back the AST result. Thus a grammar call can be reduced to a non-terminal. Each terminal must contain exactly one token. This token does not need to be already defined. If not, it will be automatically added. Nevertheless, any string cannot be a token (e.g. blanks should not appear in tokens since parsing would then depend on indentation). We introduce the notion of \emph{valid token}, as a sequence, without blanks, of characters taken from the following list: \begin{center} \verb"< > / \ - + = ; , | ! @ # % ^ & * ( ) ? : ~ $ _ ` ' a..z A..Z 0..9" \end{center} that do not start with a character from \begin{center} \verb!$ _ a..z A..Z ' 0..9! \end{center} When an LMP is used in the parsing process of an expression, it is analyzed from left to right. Every token met in the LMP should correspond to the current token of the expression. As for the grammars calls, they are performed in order to recognize parts of the initial expression. \Warning Unlike destructive expressions, if a variable appears several times in the LMP, the last binding hides the previous ones. Comparison can be performed only in the actions. \firstexample \example{Defining a syntax for inequality} The rule below allows us to use the syntax \verb+t1#t2+ for the term \verb+~t1=t2+. \begin{coq_example} Grammar constr constr1 := not_eq [ constr0($a) "#" constr0($b) ] -> [ ~$a=$b ]. \end{coq_example} The level $1$ of the grammar of terms is extended with one rule named \texttt{not\_eq}. When this rule is selected, its LMP calls the grammar \verb+constr:constr0+. This grammar recognizes a term that it binds to the metavariable \verb+$a+. Then it meets the token ``\verb+#+'' and finally it calls the grammar \verb+constr:constr0+. This grammar returns the recognized term in \verb+$b+. The action constructs the term \verb+~$a=$b+. For instance, let us give the statement of the symmetry of \verb+#+: \begin{coq_example} Goal (A:Set)(a,b:A) a#b -> b#a. \end{coq_example} This shows that the system understood the grammar extension. Nonetheless, since no special printing command was given, the goal is displayed using the usual syntax for negation and equality. One can force \verb+~a=b+ to be printed \verb=a#b= by giving pretty-printing rules. This is explained in section~\ref{Syntax}. \Warning Metavariables are identifiers preceded by the ``\verb+$+'' symbol. They cannot be replaced by identifiers. For instance, if we enter a rule with identifiers and not metavariables, the identifiers are assumed to be global names (what raises a warning if no global name is denoted by these identifiers). % Test failure \begin{coq_example} Grammar constr constr1 := not_eq [ constr0($a) "#" constr0($b) ] -> [~(a=b)]. \end{coq_example} \begin{coq_eval} Abort. \end{coq_eval} \example{Redefining vernac commands} Thanks to the following rule, ``{\tt |- term.}'' will have the same effect as ``{\tt Goal term.}''. \begin{coq_example} Grammar vernac vernac := thesis [ "|" "-" constr:constr($term) "." ] -> [Goal $term.]. \end{coq_example} \noindent This rule allows putting blanks between the bar and the dash, as in \begin{coq_example} | - (A:Prop)A->A. \end{coq_example} \begin{coq_eval} Reset Initial. \end{coq_eval} \noindent Assuming the previous rule has not been entered, we can forbid blanks with a rule that declares ``\verb+|-+'' as a single token: \begin{coq_eval} (********** The following is not correct and should produce **********) (*************** Syntax error: illegal begin of vernac ***************) (* Just to adjust the prompt: *) Set Printing Depth 50. \end{coq_eval} \begin{coq_example} Grammar vernac vernac := thesis [ "|-" constr:constr($term) "." ] -> [Goal $term.]. | - (A:Prop)A->A. \end{coq_example} \noindent If both rules were entered, we would have three tokens \verb+|+, \verb+-+ and \verb+|-+. The lexical ambiguity on the string \verb+|-+ is solved according to the longest match rule (see lexical conventions page~\pageref{lexical}), i.e. \verb+|-+ would be one single token. To enforce the use of the first rule, a blank must be inserted between the bar and the dash\footnote{It turns out that "|-" is already a token defined for other purposes, then the first rule cannot parse "|- (A:Prop)A->A" and indeed requires the insertion of a blank}. \Rem The vernac commands should always be terminated by a period. When a syntax error is detected, the top-level discards its input until it reaches a period token, and then resumes parsing. \example{Redefining tactics} We can give names to repetitive tactic sequences. Thus in this example ``{\tt IntSp}'' will correspond to the tactic {\tt Intros} followed by {\tt Split}. \begin{coq_example} Grammar tactic simple_tactic := intros_split [ "IntSp" ] -> [Intros; Split]. \end{coq_example} Let us check that this works. \begin{coq_example} Goal (A,B:Prop)A/\B -> B/\A. IntSp. \end{coq_example} \begin{coq_eval} Abort. \end{coq_eval} Note that the same result can be obtained in a simpler way with {\tt Tactic Definition} (see chapter~\ref{TacticLanguage}). \example{Priority, left and right associativity of operators} The disjunction has a higher priority than conjunction. Thus \verb+A/\B\/C+ will be parsed as \verb+(A/\B)\/C+ and not as \verb+A/\(B\/C)+. The priority is done by putting the rule for the disjunction in a higher level than that of conjunction: conjunction is defined in the non-terminal {\tt constr6} and disjunction in {\tt constr7} (see file {\tt Logic.v} in the library). Notice that the character ``\verb+\+'' must be doubled (see lexical conventions for quoted strings on page~\pageref{lexical}). \begin{coq_example*} Grammar constr constr6 := and [ constr5($c1) "/\\" constr6($c2) ] -> [(and $c1 $c2)]. Grammar constr constr7 := or [ constr6($c1) "\\/" constr7($c2) ] -> [(or $c1 $c2)]. \end{coq_example*} Thus conjunction and disjunction associate to the right since in both cases the priority of the right term (resp. {\tt constr6} and {\tt constr7}) is higher than the priority of the left term (resp. {\tt constr5} and {\tt constr6}). The left member of a conjunction cannot be itself a conjunction, unless you enclose it inside parenthesis. The left associativity is done by calling recursively the non-terminal. {\camlpppp} deals with this recursion by first trying the non-left-recursive rules. Here is an example taken from the standard library, defining a syntax for the addition on integers: \begin{coq_example*} Grammar znatural expr := expr_plus [ expr($p) "+" expr($c) ] -> [(Zplus $p $c)]. \end{coq_example*} \subsection{Actions} \label{GramAction} \index{Grammar Actions} Every rule should generate an AST corresponding to the syntactic construction that it recognizes. This generation is done by an action. Thus every rule is associated to an action. The syntax has been defined in Fig.~\ref{grammarsyntax}. We give some examples. \subsubsection{Simple actions} A simple action is an AST enclosed between ``\verb+[+'' and ``\verb+]+''. It simply builds the AST by interpreting it as a constructive expression in the environment defined by the LMP. This case has been illustrated in all the previous examples. We will later see that grammars can also return AST lists. \subsubsection{Local definitions} When an action should generate a big term, we can use \texttt{let}~\textsl{pattern}~\texttt{=}~\textsl{action}$_1$~% \texttt{in}~\textsl{action}$_2$\index{let@{\tt let}} expressions to construct it progressively. The action \textsl{action}$_1$ is first computed, then it is matched against \textsl{pattern} which may bind metavariables, and the result is the evaluation of \textsl{action}$_2$ in this new context. \example{} \noindent From the syntax \verb|t1*+t2|, we generate the term {\tt (plus (plus t1 t2) (mult t1 t2))}. \begin{coq_example} Grammar constr constr1 := mult_plus [ constr0($a) "*" "+" constr0($b) ] -> let $p1=[(plus $a $b)] in let $p2=[(mult $a $b)] in [(plus $p1 $p2)]. \end{coq_example} Let us give an example with this syntax: \begin{coq_example} Goal (O*+O)=O. \end{coq_example} \begin{coq_eval} Abort. \end{coq_eval} \subsubsection{Conditional actions} We recall the syntax of conditional actions: \begin{center} \texttt{case}~\textsl{action}~\texttt{of}~% \textsl{pattern}$_1$~\verb+->+~\textsl{action}$_1$~% \texttt{|}~$\cdots$~\texttt{|}~% \textsl{pattern}$_n$~\verb+->+~\textsl{action}$_n$~% \texttt{esac} \end{center}\index{case@@{\tt case}} The action to execute is chosen according to the value of \textsl{action}. The matching is performed from left to right. The selected action is the one associated to the first pattern that matches the value of \textsl{action}. This matching operation will bind the metavariables appearing in the selected pattern. The pattern matching does need being exhaustive, and no warning is emitted. When the pattern matching fails a message reports in which grammar rule the failure happened. \example{Overloading the ``$+$'' operator} The internal representation of an expression such as {\tt A+B} depends on the shape of {\tt A} and {\tt B}: \begin{itemize} \item \verb/{P}+{Q}/ uses {\tt sumbool} \item otherwise, \verb/A+{Q}/ uses {\tt sumor} \item otherwise, \verb/A+B/ uses {\tt sum}. \end{itemize} The trick is to build a temporary AST: \verb/{A}/ generates the node \verb/(SQUASH A)/. When we parse \verb/A+B/, we remove the {\tt SQUASH} in {\tt A} and {\tt B}: \begin{coq_example*} Grammar constr constr1: ast := squash [ "{" lconstr($lc) "}" ] -> [(SQUASH $lc)]. Grammar constr lassoc_constr4 := squash_sum [ lassoc_constr4($c1) "+" lassoc_constr4($c2) ] -> case [$c2] of (SQUASH $T2) -> case [$c1] of (SQUASH $T1) -> [(sumbool $T1 $T2)] | $_ -> [(sumor $c1 $T2)] esac | $_ -> [(sum $c1 $c2)] esac. \end{coq_example*} The first rule is casted with type ast, because the produced term cannot be reached by the input syntax. On the other hand, the second command has (implicit) type \verb+constr+, so the right hand side is parsed with the term parser. \noindent The problem is that sometimes, the intermediate {\tt SQUASH} node cannot re-shaped, then we have a very specific error: \begin{coq_eval} (********** The following is not correct and should produce **********) (************** Error: Ill-formed specification **********************) (* Just to adjust the prompt: *) Set Printing Depth 50. \end{coq_eval} \begin{coq_example} Check {True}. \end{coq_example} \example{Comparisons and non-linear patterns} The patterns may be non-linear: when an already bound metavariable appears in a pattern, the value yielded by the pattern matching must be equal, up to renaming of bound variables, to the current value. Note that this does not apply to the wildcard \verb+$_+. For example, we can compare two arguments: \begin{coq_example} Grammar constr constr10 := refl_equals [ constr9($c1) "||" constr9($c2) ] -> case [$c1] of $c2 -> [(refl_equal ? $c2)] esac. Check ([x:nat]x || [y:nat]y). \end{coq_example} \noindent The metavariable \verb+$c1+ is bound to \verb+[x:nat]x+ and \verb+$c2+ to \verb+[y:nat]y+. Since these two values are equal, the pattern matching succeeds. It fails when the two terms are not equal: % Test failure \begin{coq_eval} (********** The following is not correct and should produce **********) (************* Error: ... Grammar case failure ... ******************) (* Just to adjust the prompt: *) Set Printing Depth 50. \end{coq_eval} \begin{coq_example} Check ([x:nat]x || [z:bool]z). \end{coq_example} \subsection{Grammars of type {\tt ast list}} Assume we want to define an non-terminal {\tt ne\_identarg\_list} that parses an non-empty list of identifiers. If the grammars could only return AST's, we would have to define it this way: \begin{coq_example*} Grammar tactic my_ne_ident_list : ast := ident_list_cons [ identarg($id) my_ne_ident_list($l) ] -> case [$l] of (IDENTS ($LIST $idl)) -> [(IDENTS $id ($LIST $idl))] esac | ident_list_single [ identarg($id) ] -> [(IDENTS $id)]. \end{coq_example*} But it would be inefficient: every time an identifier is read, we remove the ``boxing'' operator {\tt IDENTS}, and put it back once the identifier is inserted in the list. To avoid these awkward trick, we allow grammars to return AST lists. Hence grammars have a type ({\tt ast} or {\tt ast list}), just like AST's do. Type-checking can be done statically. The simple actions can produce lists by putting a list of constructive expressions one beside the other. As usual, the {\tt\$LIST} operator allows to inject AST list variables. \begin{coq_example*} Grammar tactic ne_identarg_list : ast list := ne_idl_cons [ identarg($id) ne_identarg_list($idl) ] -> [ $id ($LIST $idl) ] | ne_idl_single [ identarg($id) ] -> [ $id ]. \end{coq_example*} %$ Note that the grammar type must be recalled in every extension command, or else the system could not discriminate between a single AST and an AST list with only one item. If omitted, the default type depends on the universe name. The following command fails because the non-terminal {\tt ne\_identarg\_list} is already defined with type {\tt ast list} but the {\tt Grammar} command header assumes its type is {\tt ast}. \begin{coq_eval} (********** The following is not correct and should produce **********) (****** Error: ne_identarg_list already exists with another type *****) (* Just to adjust the prompt: *) Set Printing Depth 50. \end{coq_eval} % Test failure \begin{coq_example} Grammar tactic ne_identarg_list := list_two [ identarg($id1) identarg($id2) ] -> [ $id1 ; $id2 ]. \end{coq_example} All rules of a same grammar must have the same type. For instance, the following rule is refused because the \verb+constr:constr1+ grammar has been already defined with type {\tt Ast}, and cannot be extended with a rule returning AST lists. % Test failure \begin{coq_eval} (********** The following is not correct and should produce **********) (********* Error: ']' expected after [default_action_parser] *********) (* Just to adjust the prompt: *) Set Printing Depth 50. \end{coq_eval} \begin{coq_example} Grammar constr constr1 := carret_list [ constr0($c1) "^" constr0($c2)] -> [ $c1 $c2 ]. \end{coq_example} \subsection{Limitations} The extendable grammar mechanism have four serious limitations. The first two are inherited from {\camlpppp}. \begin{itemize} \item Grammar rules are factorized syntactically: {\camlpppp} does not try to expand non-terminals to detect further factorizations. The user must perform the factorization himself. \item The grammar is not checked to be \emph{LL(1)}\index{LL(1)} when adding a rule. If it is not LL(1), the parsing may fail on an input recognized by the grammar, because selecting the appropriate rule may require looking several tokens ahead. {\camlpppp} always selects the most recent rule (and all those that factorize with it) accepting the current token. \item There is no command to remove a grammar rule. However there is a trick to do it. It is sufficient to execute the ``{\tt Reset}'' command on a constant defined before the rule we want to remove. Thus we retrieve the state before the definition of the constant, then without the grammar rule. This trick does not apply to grammar extensions done in {\ocaml}. \item Grammar rules defined inside a section are automatically removed after the end of this section: they are available only inside it. \end{itemize} The command {\tt Print Grammar} prints the rules of a grammar. It is displayed by {\camlpppp}. So, the actions are not printed, and the recursive calls are printed \verb+SELF+\index{SELF@{\tt SELF}}. It is sometimes useful if the user wants to understand why parsing fails, or why a factorization was not done as expected.\index{Print Grammar@{\tt Print Grammar}} \begin{coq_example} Print Grammar constr constr8. \end{coq_example} \subsubsection{Getting round the lack of factorization} The first limitation may require a non-trivial work, and may lead to ugly grammars, hardly extendable. Sometimes, we can use a trick to avoid these troubles. The problem arises in the {\gallina} syntax, to make {\camlpppp} factorize the rules for application and product. The natural grammar would be: % Test failure \begin{coq_eval} (********** The following is not correct and should produce **********) (******** Syntax error: ')' expected after [Constr.constr10] *********) \end{coq_eval} \begin{coq_example} Grammar constr constr0 : ast := parenthesis [ "(" constr10($c) ")" ] -> [$c] | product [ "(" prim:var($id) ":" constr($c1) ")" constr0($c2) ] -> [(PROD $c1 [$id]$c2)] with constr10 : ast := application [ constr9($c1) ne_constr_list($lc) ] -> [(APPLIST $c1 ($LIST $lc))] | inject_com91 [ constr9($c) ] -> [$c]. Check (x:nat)nat. \end{coq_example} \begin{coq_eval} Reset Initial. \end{coq_eval} But the factorization does not work, thus the product rule is never selected since identifiers match the {\tt constr10} grammar. The trick is to parse the ident as a {\tt constr10} and check \emph{a posteriori} that the term is indeed an identifier: \begin{coq_example} Grammar constr constr0 : ast := product [ "(" constr10($c) ":" constr($c1) ")" constr0($c2) ] -> [(PROD $c1 [$c]$c2)]. Check (x:nat)nat. \end{coq_example} %$ \noindent We could have checked it explicitly with a {\tt case} in the right-hand side of the rule, but the error message in the following example would not be as relevant: % Test failure \begin{coq_eval} (********** The following is not correct and should produce **********) (******* Error: This expression should be a simple identifier ********) (* Just to adjust the prompt: *) Set Printing Depth 50. \end{coq_eval} \begin{coq_example} Check (S O:nat)nat. \end{coq_example} \noindent This trick is not similar to the {\tt SQUASH} node in which we could not detect the error while parsing. Here, the error pops out when trying to build an abstraction of {\tt\$c2} over the value of {\tt\$c}. Since it is not bound to a variable, the right-hand side of the product grammar rule fails. \section{Writing your own pretty printing rules} \label{Syntax} \comindex{Syntax} There is a mechanism for extending the vernacular's printer by adding, in the interactive toplevel, new printing rules. The printing rules are stored into a table and will be recovered at the moment of the printing by the vernacular's printer. The user can print new constants, tactics and vernacular phrases with his desired syntax. The printing rules for new constants should be written {\em after} the definition of the constants. The rules should be outside a section if the user wants them to be exported. The printing rules corresponding to the heart of the system (primitive tactics, terms and the vernacular language) are defined, respectively, in the files {\tt PPTactic.v} and {\tt PPConstr.v} (in the directory {\tt syntax}). These files are automatically loaded in the initial state. The user is not expected to modify these files unless he dislikes the way primitive things are printed, in which case he will have to compile the system after doing the modifications. When the system fails to find a suitable printing rule, a tag \verb+#GENTERM+\index{GENTERM@{\tt\#GENTERM}} appears in the message. In the following we give some examples showing how to write the printing rules for the non-terminal and terminal symbols of a grammar. We will test them frequently by inspecting the error messages. Then, we give the grammar of printing rules and a description of its semantics. \subsection{The Printing Rules} \subsubsection{The printing of non terminals} The printing is the inverse process of parsing. While a grammar rule maps an input stream of characters into an AST, a printing rule maps an AST into an output stream of printing orders. So given a certain grammar rule, the printing rule is generally obtained by inverting the grammar rule. Like grammar rules, it is possible to define several rules at the same time. The exact syntax for complex rules is described in~\ref{syntaxsyntax}. A simple printing rule is of the form: \begin{center} \verb+Syntax+ ~{\sl universe} ~\verb+level+ ~{\sl precedence}~\verb+:+~{\sl name} ~\verb+[+~{\sl pattern}~\verb+] -> [+~{\sl printing-orders}~\verb+].+ \end{center} where : \begin{itemize} \item {\it universe} is an identifier denoting the universe of the AST to be printed. They have the same meaning as grammar universes. The vernac universe has no equivalent in pretty-printing since vernac phrases are never printed by the system. Error messages are reported by re-displaying what the user typed in. \item {\it precedence} is positive integer indicating the precedence of the rule. In general the precedence for tactics is 0. The universe of terms is implicitly stratified by the hierarchy of the parsing rules. We have non terminals \textit{constr0}, \textit{constr1}, \ldots, \textit{constr10}. The idea is that objects parsed with the non terminal $constr_i$ have precedence $i$. In most of the cases we fix the precedence of the printing rules for commands to be the same number of the non terminal with which it is parsed. A precedence may also be a triple of integers. The triples are ordered in lexicographic order, and the level $n$ is equal to {\tt [$n~0~0$]}. \item {\it name} is the name of the printing rule. A rule is identified by both its universe and name, if there are two rules with both the same name and universe, then the last one overrides the former. \item {\it pattern} is a pattern that matches the AST to be printed. The syntax of patterns is dependent on the universe of the AST to be printed (e.g. patterns are parsed as {\tt constr} if the universe is {\tt constr}, etc), and a quotation can be used to escape the default parser associated to this universe. A description of the syntax of patterns is given in section \ref{patternsyntax}. \item {\it printing-orders} is the sequence of orders indicating the concrete layout of the printer. \end{itemize} \firstexample \example{Syntax for user-defined tactics.} The first usage of the \texttt{Syntax} command might be the printing order for a user-defined tactic: \begin{coq_example*} Declare ML Module "eqdecide". Syntax tactic level 0: Compare_PP [(Compare $com1 $com2)] -> ["Compare" [1 2] $com1 [1 2] $com2]. \end{coq_example*} % meme pas vrai If such a printing rule is not given, a disgraceful \verb+#GENTERM+ will appear when typing \texttt{Show Script} or \texttt{Save}. For a tactic macro defined by a \texttt{Tactic Definition} command, a printing rule is automatically generated so the user don't have to write one. \example{Defining the syntax for new constants.} Let's define the constant \verb+Xor+ in Coq: \begin{coq_eval} Reset Initial. \end{coq_eval} \begin{coq_example*} Definition Xor := [A,B:Prop] A/\~B \/ ~A/\B. \end{coq_example*} Given this definition, we want to use the syntax of \verb+A X B+ to denote \verb+(Xor A B)+. To do that we give the grammar rule: \begin{coq_example} Grammar constr constr7 := Xor [ constr6($c1) "X" constr7($c2) ] -> [(Xor $c1 $c2)]. \end{coq_example} Note that the operator is associative to the right. Now \verb+True X False+ is well parsed: \begin{coq_example} Goal True X False. \end{coq_example} To have it well printed we extend the printer: \begin{coq_example} Syntax constr level 7: Pxor [(Xor $t1 $t2)] -> [ $t1:L " X " $t2:E ]. \end{coq_example} and now we have the desired syntax: \begin{coq_example} Show. \end{coq_example} Let's comment the rule: \begin{itemize} \item \verb+constr+ is the universe of the printing rule. \item \verb+7+ is the rule's precedence and it is the same one than the parsing production (constr7). \item \verb+Pxor+ is the name of the printing rule. \item \verb+(Xor $t1 $t2)+ is the pattern of the term to be printed. Between \verb+<< >>+ we are allowed to use the syntax of arbitrary AST instead of terms. Metavariables may occur in the pattern but preceded by \verb+$+. \item \verb+$t1:L " X " $t2:E+ are the printing orders, it tells to print the value of \verb+$t1+ then the symbol \verb+X+ and then the value of \verb+$t2+. The \verb+L+ in the little box \verb+$t1:L+ indicates not to put parentheses around the value of \verb+$t1+ if its precedence is {\bf less} than the rule's one. An \verb+E+ instead of the \verb+L+ would mean not to put parentheses around the value of \verb+$t1+ if its the precedence is {\bf less or equal} than the rule's one. The associativity of the operator can be expressed in the following way: \verb&$t1:L " X " $t2:E& associates the operator to the right. \verb&$t1:E " X " $t2:L& associates to the left. \verb&$t1:L " X " $t2:L& is non-associative. \end{itemize} Note that while grammar rules are related by the name of non-terminals (such as {\tt constr6} and {\tt constr7}) printing rules are isolated. The {\tt Pxor} rule tells how to print an {\tt Xor} expression but not how to print its subterms. The printer looks up recursively the rules for the values of \verb+$t1+ and \verb+$t2+. The selection of the printing rules is strictly determined by the structure of the AST to be printed. This could have been defined with the {\tt Infix} command. \example{Forcing to parenthesize a new syntactic construction} You can force to parenthesize a new syntactic construction by fixing the precedence of its printing rule to a number greater than 9. For example a possible printing rule for the {\tt Xor} connector in the prefix notation would be: \begin{coq_example*} Syntax constr level 10: ex_imp [(Xor $t1 $t2)] -> [ "X " $t1:L " " $t2:L ]. \end{coq_example*} No explicit parentheses are contained in the rule, nevertheless, when using the connector, the parentheses are automatically written: \begin{coq_example} Show. \end{coq_example} A precedence higher than 9 ensures that the AST value will be parenthesized by default in either the empty context or if it occurs in a context where the instructions are of the form \verb+$t:L+ or \verb+$t:E+. \example{Dealing with list patterns in the syntax rules} The following productions extend the parser to recognize a tactic called \verb+MyIntros+ that receives a list of identifiers as argument as the primitive \verb+Intros+ tactic does: \begin{coq_example*} Grammar tactic simple_tactic: ast := my_intros [ "MyIntros" ne_identarg_list($idl) ] -> [(MyIntrosWith ($LIST $idl))]. \end{coq_example*} To define the printing rule for \verb+MyIntros+ it is necessary to define the printing rule for the non terminal \verb+ne_identarg_list+. In grammar productions the dependency between the non terminals is explicit. This is not the case for printing rules, where the dependency between the rules is determined by the structure of the pattern. So, the way to make explicit the relation between printing rules is by adding structure to the patterns. \begin{coq_example} Syntax tactic level 0: myintroswith [<<(MyIntrosWith ($LIST $L))>>] -> [ "MyIntros " (NEIDENTARGLIST ($LIST $L)) ]. \end{coq_example} This rule says to print the string \verb+MyIntros+ and then to print the value of \\ \verb+(NEIDENTARGLIST ($LIST $L))+. \begin{coq_example} Syntax tactic level 0: ne_identarg_list_cons [<<(NEIDENTARGLIST $id ($LIST $l))>>] -> [ $id " " (NEIDENTARGLIST ($LIST $l)) ] | ne_identarg_list_single [<<(NEIDENTARGLIST $id)>>] -> [ $id ]. \end{coq_example} The first rule says how to print a non-empty list, while the second one says how to print the list with exactly one element. Note that the pattern structure of the binding in the first rule ensures its use in a recursive way. Like the order of grammar productions, the order of printing rules \emph{does matter}. In case of two rules whose patterns superpose each other the {\bf last} rule is always chosen. In the example, if the last two rules were written in the inverse order the printing will not work, because only the rule {\sl ne\_identarg\_list\_cons} would be recursively retrieved and there is no rule for the empty list. Other possibilities would have been to write a rule for the empty list instead of the {\sl ne\_identarg\_list\_single} rule, \begin{coq_example} Syntax tactic level 0: ne_identarg_list_nil [<<(NEIDENTARGLIST)>>] -> [ ]. \end{coq_example} This rule indicates to do nothing in case of the empty list. In this case there is no superposition between patterns (no critical pairs) and the order is not relevant. But a useless space would be printed after the last identifier. \example{Defining constants with arbitrary number of arguments} Sometimes the constants we define may have an arbitrary number of arguments, the typical case are polymorphic functions. Let's consider for example the functional composition operator. The following rule extends the parser: \begin{coq_example*} Definition explicit_comp := [A,B,C:Set][f:A->B][g:B->C][a:A](g (f a)). Grammar constr constr6 := expl_comp [constr5($c1) "o" constr6($c2) ] -> [(explicit_comp ? ? ? $c1 $c2)]. \end{coq_example*} Our first idea is to write the printing rule just by ``inverting'' the production: \begin{coq_example} Syntax constr level 6: expl_comp [(explicit_comp ? ? ? $f $g)] -> [ $f:L "o" $g:L ]. \end{coq_example} This rule is not correct: \verb+?+ is an ordinary AST (indeed, it is the AST \verb|(XTRA "ISEVAR")|), and does not behave as the ``wildcard'' pattern \verb|$_|. Here is a correct version of this rule: \begin{coq_example*} Syntax constr level 6: expl_comp [(explicit_comp $_ $_ $_ $f $g)] -> [ $f:L "o" $g:L ]. \end{coq_example*} Let's test the printing rule: % 2nd test should fail \begin{coq_example} Definition Id := [A:Set][x:A]x. Check (Id nat) o (Id nat). Check ((Id nat)o(Id nat) O). \end{coq_example} In the first case the rule was used, while in the second one the system failed to match the pattern of the rule with the AST of \verb+((Id nat)o(Id nat) O)+. Internally the AST of this term is the same as the AST of the term \verb+(explicit_comp nat nat nat (Id nat) (Id nat) O)+. When the system retrieves our rule it tries to match an application of six arguments with an application of five arguments (the AST of \verb+(explicit_comp $_ $_ $_ $f $g)+). %$ Then, the matching fails and the term is printed using the rule for application. Note that the idea of adding a new rule for \verb+explicit_comp+ for the case of six arguments does not solve the problem, because of the polymorphism, we can always build a term with one argument more. The rules for application deal with the problem of having an arbitrary number of arguments by using list patterns. Let's see these rules: \begin{coq_eval} Definition foo := O. \end{coq_eval} \begin{coq_example*} Syntax constr level 10: app [<<(APPLIST $H ($LIST $T))>>] -> [ [ $H:E (APPTAIL ($LIST $T)):E ] ] | apptailcons [<<(APPTAIL $H ($LIST $T))>>] -> [ [1 1] $H:L (APPTAIL ($LIST $T)):E ] | apptailnil [<<(APPTAIL)>>] -> [ ]. \end{coq_example*} \begin{coq_eval} Reset foo. \end{coq_eval} The first rule prints the operator of the application, and the second prints the list of its arguments. Then, one solution to our problem is to specialize the first rule of the application to the cases where the operator is \verb+explicit_comp+ and the list pattern has {\bf at least} five arguments: \begin{coq_example*} Syntax constr level 10: expl_comp [<<(APPLIST <> $_ $_ $_ $f $g ($LIST $l))>>] -> [ [ $f:L "o" $g:L (APPTAIL ($LIST $l)):E ] ]. \end{coq_example*} %$ Now we can see that this rule works for any application of the operator: \begin{coq_example} Check ((Id nat) o (Id nat) O). Check ((Id nat->nat) o (Id nat->nat) [x:nat]x O). \end{coq_example} In the examples presented by now, the rules have no information about how to deal with indentation, break points and spaces, the printer will write everything in the same line without spaces. To indicate the concrete layout of the patterns, there's a simple language of printing instructions that will be described in the following section. \subsubsection{The printing of terminals} The user is not expected to write the printing rules for terminals, this is done automatically. Primitive printing is done for identifiers, strings, paths, numbers. For example : \begin{coq_example*} Grammar vernac vernac: ast := mycd [ "MyCd" prim:string($dir) "." ] -> [(MYCD $dir)]. Syntax vernac level 0: mycd [<<(MYCD $dir)>>] -> [ "MyCd " $dir ]. \end{coq_example*} There is no more need to encapsulate the \verb+$dir+ meta-variable with the \verb+$PRIM+ or the \verb+$STR+ operator as in the version 6.1. However, the pattern \verb+(MYCD ($STR $dir))+ would be safer, because the rule would not be selected to print an ill-formed AST. The name of default primitive printer is the {\ocaml} function {\tt print\_token}. If the user wants a particular terminal to be printed by another printer, he may specify it in the right part of the rule. Example: % Pas encore possible! $num est un id, pas un entier. %Syntax tactic level 0 : % do_pp [<>] % -> [ "Do " $num:"my_printer" [1 1] $tactic ]. \begin{coq_example*} Syntax tactic level 0 : do_pp [<<(DO ($NUM $num) $tactic)>>] -> [ "Do " $num:"my_printer" [1 1] $tactic ]. \end{coq_example*} The printer \textit{my\_printer} must have been installed as shown below. \subsubsection{Primitive printers} Writing and installing primitive pretty-printers requires to have the sources of the system like writing tactics. A primitive pretty-printer is an \ocaml\ function of type \begin{verbatim} Esyntax.std_printer -> CoqAst.t -> Pp.std_ppcmds \end{verbatim} The first argument is the global printer, it can be called for example by the specific printer to print subterms. The second argument is the AST to print, and the result is a stream of printing orders like : \begin{itemize} \item \verb+'sTR"+\textit{string}\verb+"+ to print the string \textit{string} \item \verb+'bRK +\textit{num1 num2} that has the same semantics than \verb+[+ \textit{num1 num2} \verb+]+ in the print rules. \item \verb+'sPC+ to leave a blank space \item \verb+'iNT+ $n$ to print the integer $n$ \item \ldots \end{itemize} There is also commands to make boxes (\verb+h+ or \verb+hv+, described in file {\tt lib/pp.mli}). Once the printer is written, it must be registered by the command : \begin{verbatim} Esyntax.Ppprim.add ("name",my_printer);; \end{verbatim} \noindent Then, in the toplevel, after having loaded the right {\ocaml} module, it can be used in the right hand side of printing orders using the syntax \verb+$truc:"name"+. The real name and the registered name of a pretty-printer does not need to be the same. However, it can be nice and simple to give the same name. \subsection{Syntax for pretty printing rules} \label{syntaxsyntax} This section describes the syntax for printing rules. The metalanguage conventions are the same as those specified for the definition of the {\sl pattern}'s syntax in section \ref{patternsyntax}. The grammar of printing rules is the following: \begin{center} \begin{tabular}{|rcl|} \hline {\sl printing-rule} & ::= & \verb+Syntax+~{\ident}~~\nelist{{\sl level}}{;} \\ {\sl level} & ::= & \verb+level+~{\sl precedence}~\verb+:+ ~\nelist{{\sl rule}}{|} \\ {\sl precedence} & ::= & {\integer} ~$|$~ \verb+[+~\integer~\integer~\integer~\verb+]+ \\ {\sl rule} & ::= & {\sl ident}~\verb+[+~{\sl pattern}~\verb+] -> [+% ~\sequence{{\sl printing-order}}{}~\verb+]+ \\ {\sl printing-order} & ::= & \verb+FNL+ \\ &$|$& {\sl string} \\ &$|$& \verb+[+~\integer~\integer~\verb+]+ \\ &$|$& \verb+[+~box~\sequence{{\sl printing-order}}{}~\verb+]+ \\ &$|$& {\sl ast}~\zeroone{{\tt :}~{\sl prim-printer}}~% \zeroone{{\tt :}~{\sl paren-rel}}\\ {\sl box} & ::= & \verb+<+~{\sl box-type}~\integer~\verb+>+\\ {\sl box-type} & ::= & ~\verb+hov+~$|$~\verb+hv+~$|$~\verb+v+~$|$~\verb+h+\\ {\sl paren-rel} & ::= & \verb+L+~$|$~\verb+E+ \\ {\sl prim-printer} & ::= & {\sl string} \\ {\sl pattern} & ::= & {\sl ast-quot} ~$|$~\verb+<<+ {\sl ast} \verb+>>+ \\ \hline \end{tabular} \end{center} Non-terminal {\sl ast-quot} is the default quotation associated to the extended universe. Patterns not belonging to the input syntax can be given directly as AST using \verb+<< >>+. As already stated, the order of rules in a given level is relevant (the last ones override the previous ones). \subsubsection{Pretty grammar structures} The basic structure is the printing order sequence. Each order has a printing effect and they are sequentially executed. The orders can be: \begin{itemize} \item printing orders \item printing boxes \end{itemize} \paragraph{Printing orders} Printing orders can be of the form: \begin{itemize} \item \verb+"+{\sl string}\verb+"+ prints the {\sl string}. \item \verb+FNL+ force a new line. \item \texttt{\$t:}\textsl{paren-rel} or \texttt{\$t:}\textsl{prim-printer}\texttt{:}\textsl{paren-rel} {\sl ast} is used to build an AST in current context. The printer looks up the adequate printing rule and applies recursively this method. The optional field {\it prim-printer} is a string with the name primitive pretty-printer to call (The name is not the name of the {\ocaml} function, but the name given to {\tt Esyntax.Ppprim.add}). Recursion of the printing is determined by the pattern's structure. {\it paren-rel} is the following: \begin{tabular}{cl} \verb+L+ & if $t$ 's precedence is {\bf less} than the rule's one, then no parentheses \\ & around $t$ are written. \\ \verb+E+ & if $t$ 's precedence is {\bf less or equal} than the rule's one then no parentheses \\ & around $t$ are written. \\ {\it none} & {\bf never} write parentheses around $t$. \\\\ \end{tabular} \end{itemize} \paragraph{Printing boxes} The concept of formatting boxes is used to describe the concrete layout of patterns: a box may contain many objects which are orders or sub\-boxes sequences separated by break\-points; the box wraps around them an imaginary rectangle. \begin{enumerate} \item {\bf Box types} The type of boxes specifies the way the components of the box will be displayed and may be: \begin{itemize} \item \verb+h+ : to catenate objects horizontally. \item \verb+v+ : to catenate objects vertically. \item \verb+hv+ : to catenate objects as with an ``h box'' but an automatic vertical folding is applied when the horizontal composition does not fit into the width of the associated output device. \item \verb+hov+ : to catenate objects horizontally but if the horizontal composition does not fit, a vertical composition will be applied, trying to catenate horizontally as many objects as possible. \end{itemize} The type of the box can be followed by a {\it n} offset value, which is the offset added to the current indentation when breaking lines inside the box. \item {\bf Boxes syntax} A box is described by a sequence surrounded by \verb+[ ]+. The first element of the sequence is the box type: this type surrounded by the symbols \verb+< >+ is one of the words \verb+hov+, \verb+hv+, \verb+v+, \verb+v+ followed by an offset. The default offset is 0 and the default box type is \verb+h+. \item {\bf Break\-points} In order to specify where the pretty-printer is allowed to break, one of the following break-points may be used: \begin{itemize} \item \verb+[0 0]+ is a simple break-point, if the line is not broken here, no space is included (``Cut''). \item \verb+[1 0]+ if the line is not broken then a space is printed (``Spc''). \item \verb+[i j]+ if the line is broken, the value $j$ is added to the current indentation for the following line; otherwise $i$ blank spaces are inserted (``Brk''). \end{itemize} \noindent {\bf Examples :} It is interesting to test printing rules on ``small'' and ``large'' expressions in order to see how the break of lines and indentation are managed. Let's define two constants and make a \verb+Print+ of them to test the rules. Here are some examples of rules for our constant \verb+Xor+: \begin{coq_example*} Definition A := True X True. Definition B := True X True X True X True X True X True X True X True X True X True X True X True X True. \end{coq_example*} \begin{coq_example*} Syntax constr level 6: Pxor [(Xor $t1 $t2)] -> [ $t1:L " X " $t2:E ]. \end{coq_example*} This rule prints everything in the same line exceeding the line's width. % Avec coq-tex, le bord de l'e'cran n'est pas mate'rialise'. %\begin{coq_example} %Print B. %\end{coq_example} \begin{small} \begin{flushleft} \verb!Coq < Print B.!\\ \texttt{\textit{B~=~}}\\ \texttt{\textit{True~X~True~X~True~X~True~X~True~X~True~X~True~X~True~X~True~X~True~X~Tru}}\\ \texttt{\textit{e~X~True~X~True}}\\ \texttt{\textit{~~~~~:~Prop}}\\ \end{flushleft} \end{small} Let's add some break-points in order to force the printer to break the line before the operator: \begin{coq_example*} Syntax constr level 6: Pxor [(Xor $t1 $t2)] -> [ $t1:L [0 1] " X " $t2:E ]. \end{coq_example*} \begin{coq_example} Print B. \end{coq_example} The line was correctly broken but there is no indentation at all. To deal with indentation we use a printing box: \begin{coq_example*} Syntax constr level 6: Pxor [(Xor $t1 $t2)] -> [ [ $t1:L [0 1] " X " $t2:E ] ]. \end{coq_example*} With this rule the printing of A is correct, an the printing of B is indented. \begin{coq_example} Print B. \end{coq_example} If we had chosen the mode \verb+v+ instead of \verb+hov+ : \begin{coq_example*} Syntax constr level 6: Pxor [(Xor $t1 $t2)] -> [ [ $t1:L [0 1] " X " $t2:E ] ]. \end{coq_example*} We would have obtained a vertical presentation: \begin{coq_example} Print A. \end{coq_example} The difference between the presentation obtained with the \verb+hv+ and \verb+hov+ type box is not evident at first glance. Just for clarification purposes let's compare the result of this silly rule using an \verb+hv+ and a \verb+hov+ box type: \begin{coq_example*} Syntax constr level 6: Pxor [(Xor $t1 $t2)] -> [ [ "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" [0 0] "----------------------------------------" [0 0] "ZZZZZZZZZZZZZZZZ" ] ]. \end{coq_example*} \begin{coq_example} Print A. \end{coq_example} \begin{coq_example*} Syntax constr level 6: Pxor [(Xor $t1 $t2)] -> [ [ "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" [0 0] "----------------------------------------" [0 0] "ZZZZZZZZZZZZZZZZ" ] ]. \end{coq_example*} \begin{coq_example} Print A. \end{coq_example} In the first case, as the three strings to be printed do not fit in the line's width, a vertical presentation is applied. In the second case, a vertical presentation is applied, but as the last two strings fit in the line's width, they are printed in the same line. \end{enumerate} \subsection{Debugging the printing rules} By now, there is almost no semantic check of printing rules in the system. To find out where the problem is, there are two possibilities: to analyze the rules looking for the most common errors or to work in the toplevel tracing the ml code of the printer. When the system can't find the proper rule to print an Ast, it prints \verb+#GENTERM+ \textit{ast}. If you added no printing rule, it's probably a bug and you can send it to the \Coq\ team. \subsubsection{Most common errors} Here are some considerations that may help to get rid of simple errors: \begin{itemize} \item make sure that the rule you want to use is not defined in previously closed section. \item make sure that {\bf all} non-terminals of your grammar have their corresponding printing rules. \item make sure that the set of printing rules for a certain non terminal covers all the space of AST values for that non terminal. \item the order of the rules is important. If there are two rules whose patterns superpose (they have common instances) then it is always the most recent rule that will be retrieved. \item if there are two rules with the same name and universe the last one overrides the first one. The system always warns you about redefinition of rules. \end{itemize} \subsubsection{Tracing the {\ocaml} code of the printer} Some of the conditions presented above are not easy to verify when dealing with many rules. In that case tracing the code helps to understand what is happening. The printers are in the file {\tt src/typing/printer}. There you will find the functions: \begin{itemize} \item {\tt prterm} : the printer of constructions \item {\tt gentacpr} : the printer of tactics \end{itemize} These printers are defined in terms of a general printer {\tt genprint} (this function is located in {\tt src/parsing/esyntax.ml}) and by instantiating it with the adequate parameters. {\tt genprint} waits for: the universe to which this AST belongs ({\it tactic}, {\it constr}), a default printer, the precedence of the AST inherited from the caller rule and the AST to print. {\tt genprint} looks for a rule whose pattern matches the AST, and executes in order the printing orders associated to this rule. Subterms are printed by recursively calling the generic printer. If no rule matches the AST, the default printer is used. An AST of a universe may have subterms that belong to another universe. For instance, let $v$ be the AST of the tactic expression \verb+MyTactic O+. The function {\tt gentacpr} is called to print $v$. This function instantiates the general printer {\tt genprint} with the universe {\it tactic}. Note that $v$ has a subterm $c$ corresponding to the AST of \verb+O+ ($c$ belongs to the universe {\it constr}). {\tt genprint} will try recursively to print all subterms of $v$ as belonging to the same universe of $v$. If this is not possible, because the subterm belongs to another universe, then the default printer that was given as argument to {\tt genprint} is applied. The default printer is responsible for changing the universe in a proper way calling the suitable printer for $c$. \medskip\noindent {\bf Technical Remark.} In the file \verb+PPTactic.v+, there are some rules that do not arise from the inversion of a parsing rule. They are strongly related to the way the printing is implemented. \begin{coq_example*} Syntax tactic level 8: tactic_to_constr [<<(COMMAND $c)>>] -> [ $c:"constr":9 ] \end{coq_example*} As an AST of tactic may have subterms that are commands, these rules allow the printer of tactic to change the universe. The primitive printer {\tt command} is a special identifier used for this purpose. They are used in the code of the default printer that {\tt gentacpr} gives to {\tt genprint}. \fi % $Id$ %%% Local Variables: %%% mode: latex %%% TeX-master: "Reference-Manual" %%% End: