\chapter{Syntax extensions and interpretation scopes} \label{Addoc-syntax} In this chapter, we introduce advanced commands to modify the way {\Coq} parses and prints objects, i.e. the translations between the concrete and internal representations of terms and commands. The main commands are {\tt Notation} and {\tt Infix} which are described in section \ref{Notation}. It also happens that the same symbolic notation is expected in different contexts. To achieve this form of overloading, {\Coq} offers a notion of interpretation scope. This is described in section \ref{scopes}. \Rem The commands {\tt Grammar}, {\tt Syntax} and {\tt Distfix} which were present for a while in {\Coq} are no longer available from {\Coq} version 8.0. The underlying AST structure is also no longer available. The functionalities of the command {\tt Syntactic Definition} are still available, see section \ref{Abbreviations}. \section{Notations} \label{Notation} \comindex{Notation} \subsection{Basic notations} A {\em notation} is a symbolic abbreviation denoting some term or term pattern. A typical notation is the use of the infix symbol \verb=/\= to denote the logical conjunction (\texttt{and}). Such a notation is declared by \begin{coq_example*} Notation "A /\ B" := (and A B). \end{coq_example*} The expression \texttt{(and A B)} is the abbreviated term and the string \verb="A /\ B"= (called a {\em notation}) tells how it is symbolically written. A notation is always surrounded by double quotes (excepted when the abbreviation is a single ident, see \ref{Abbreviations}). The notation is composed of {\em tokens} separated by spaces. Identifiers in the string (such as \texttt{A} and \texttt{B}) are the {\em parameters} of the notation. They must occur at least once each in the denoted term. The other elements of the string (such as \verb=/\=) are the {\em symbols}. An identifier can be used as a symbol but it must be surrounded by simple quotes to avoid the confusion with a parameter. Similarly, every symbol of at least 3 characters and starting with a simple quote must be quoted (then it starts by two single quotes). Here is an example. \begin{coq_example*} Notation "'IF' c1 'then' c2 'else' c3" := (IF_then_else c1 c2 c3). \end{coq_example*} %TODO quote the identifier when not in front, not a keyword, as in "x 'U' y" ? A notation binds a syntactic expression to a term. Unless the parser and pretty-printer of {\Coq} already know how to deal with the syntactic expression (see \ref{ReservedNotation}), explicit precedences and associativity rules have to be given. \subsection{Precedences and associativity} \index{Precedences} \index{Associativity} Mixing different symbolic notations in a same text may cause serious parsing ambiguity. To deal with the ambiguity of notations, {\Coq} uses precedence levels ranging from 0 to 100 (plus one extra level numbered 200) and associativity rules. Consider for example the new notation \begin{coq_example*} Notation "A \/ B" := (or A B). \end{coq_example*} Clearly, an expression such as {\tt (A:Prop)True \verb=/\= A \verb=\/= A \verb=\/= False} is ambiguous. To tell the {\Coq} parser how to interpret the expression, a priority between the symbols \verb=/\= and \verb=\/= has to be given. Assume for instance that we want conjunction to bind more than disjunction. This is expressed by assigning a precedence level to each notation, knowing that a lower level binds more than a higher level. Hence the level for disjunction must be higher than the level for conjunction. Since connectives are the less tight articulation points of a text, it is reasonable to choose levels not so far from the higher level which is 100, for example 85 for disjunction and 80 for conjunction\footnote{which are the levels effectively chosen in the current implementation of {\Coq}}. Similarly, an associativity is needed to decide whether {\tt True \verb=/\= False \verb=/\= False} defaults to {\tt True \verb=/\= (False \verb=/\= False)} (right associativity) or to {\tt (True \verb=/\= False) \verb=/\= False} (left associativity). We may even consider that the expression is not well-formed and that parentheses are mandatory (this is a ``no associativity'')\footnote{ {\Coq} accepts notations declared as no associative but the parser on which {\Coq} is built, namely {\camlpppp}, currently does not implement the no-associativity and replace it by a left associativity; hence it is the same for {\Coq}: no-associativity is in fact left associativity}. We don't know of a special convention of the associativity of disjunction and conjunction, let's apply for instance a right associativity (which is the choice of {\Coq}). Precedence levels and associativity rules of notations have to be given between parentheses in a list of modifiers that the \texttt{Notation} command understands. Here is how the previous examples refine. \begin{coq_example*} Notation "A /\ B" := (and A B) (at level 80, right associativity). Notation "A \/ B" := (or A B) (at level 85, right associativity). \end{coq_example*} By default, a notation is considered non associative, but the precedence level is mandatory (except for special cases whose level is canonical). The level is either a number or the mention {\tt next level} whose meaning is obvious. The list of levels already assigned is on Figure~\ref{init-notations}. \subsection{Complex notations} Notations can be made from arbitraly complex symbols. One can for instance define prefix notations. \begin{coq_example*} Notation "~ x" := (not x) (at level 75, right associativity). \end{coq_example*} One can also define notations for incomplete terms, with the hole expected to be inferred at typing time. \begin{coq_example*} Notation "x = y" := (@eq _ x y) (at level 70, no associativity). \end{coq_example*} One can define {\em closed} notations whose both sides are symbols. In this case, the default precedence level for inner subexpression is 200. \begin{coq_eval} Set Printing Depth 50. (********** The following is correct but produces **********) (**** an incompatibility with the reserved notation ********) \end{coq_eval} \begin{coq_example*} Notation "( x , y )" := (@pair _ _ x y) (at level 0). \end{coq_example*} One can also define notations for binders. \begin{coq_eval} Set Printing Depth 50. (********** The following is correct but produces **********) (**** an incompatibility with the reserved notation ********) \end{coq_eval} \begin{coq_example*} Notation "{ x : A | P }" := (sig A (fun x => P)) (at level 0). \end{coq_example*} In the last case though, there is a conflict with the notation for type casts. This last notation, as shown by the command {\tt Print Grammar constr} is at level 100. To avoid \verb=x : A= being parsed as a type cast, it is necessary to put {\tt x} at a level below 100, typically 99. Hence, a correct definition is \begin{coq_example*} Notation "{ x : A | P }" := (sig A (fun x => P)) (at level 0, x at level 99). \end{coq_example*} %This change has retrospectively an effect on the notation for notation %{\tt "{ A } + { B }"}. For the sake of factorisation, {\tt A} must be %put at level 99 too, which gives % %\begin{coq_example*} %Notation "{ A } + { B }" := (sumbool A B) (at level 0, A at level 99). %\end{coq_example*} See the next section for more about factorisation. \subsection{Simple factorisation rules} {\Coq} extensible parsing is performed by Camlp4 which is essentially a LL1 parser. Hence, some care has to be taken not to hide already existing rules by new rules. Some simple left factorisation work has to be done. Here is an example. \begin{coq_eval} (********** The next rule for notation _ < _ < _ produces **********) (*** Error: Notation _ < _ < _ is already defined at level 70 ... ***) \end{coq_eval} \begin{coq_example*} Notation "x < y" := (lt x y) (at level 70). Notation "x < y < z" := (x < y /\ y < z) (at level 70). \end{coq_example*} In order to factorise the left part of the rules, the subexpression referred by {\tt y} has to be at the same level in both rules. However the default behavior puts {\tt y} at the next level below 70 in the first rule (no associativity is the default), and at the level 200 in the second rule (level 200 is the default for inner expressions). To fix this, we need to force the parsing level of {\tt y}, as follows. \begin{coq_example*} Notation "x < y" := (lt x y) (at level 70). Notation "x < y < z" := (x < y /\ y < z) (at level 70, y at next level). \end{coq_example*} For the sake of factorisation with {\Coq} predefined rules, simple rules have to be observed for notations starting with a symbol: e.g. rules starting with ``\{'' or ``('' should be put at level 0. The list of {\Coq} predefined notations can be found in chapter \ref{Theories}. The command to display the current state of the {\Coq} term parser is \comindex{Print Grammar constr} \begin{quote} \tt Print Grammar constr. \end{quote} \subsection{Displaying symbolic notations} The command \texttt{Notation} has an effect both on the {\Coq} parser and on the {\Coq} printer. For example: \begin{coq_example} Check (and True True). \end{coq_example} However, printing, especially pretty-printing, requires more care than parsing. We may want specific indentations, line breaks, alignment if on several lines, etc. The default printing of notations is very rudimentary. For printing a notation, a {\em formatting box} is opened in such a way that if the notation and its arguments cannot fit on a single line, a line break is inserted before the symbols of the notation and the arguments on the next lines are aligned with the argument on the first line. A first, simple control that a user can have on the printing of a notation is the insertion of spaces at some places of the notation. This is performed by adding extra spaces between the symbols and parameters: each extra space (other than the single space needed to separate the components) is interpreted as a space to be inserted by the printer. Here is an example showing how to add spaces around the bar of the notation. \begin{coq_example} Notation "{{ x : A | P }}" := (sig (fun x : A => P)) (at level 0, x at level 99). Check (sig (fun x : nat => x=x)). \end{coq_example} The second, more powerful control on printing is by using the {\tt format} modifier. Here is an example \begin{small} \begin{coq_example} Notation "'If' c1 'then' c2 'else' c3" := (IF_then_else c1 c2 c3) (at level 200, right associativity, format "'[v ' 'If' c1 '/' '[' 'then' c2 ']' '/' '[' 'else' c3 ']' ']'"). \end{coq_example} \end{small} A {\em format} is an extension of the string denoting the notation with the possible following elements delimited by single quotes: \begin{itemize} \item extra spaces are translated into simple spaces \item tokens of the form \verb='/ '= are translated into breaking point, in case a line break occurs, an indentation of the number of spaces after the ``\verb=/='' is applied (2 spaces in the given example) \item token of the form \verb='//'= force writing on a new line \item well-bracketed pairs of tokens of the form \verb='[ '= and \verb=']'= are translated into printing boxes; in case a line break occurs, an extra indentation of the number of spaces given after the ``\verb=[='' is applied (4 spaces in the example) \item well-bracketed pairs of tokens of the form \verb='[hv '= and \verb=']'= are translated into horizontal-orelse-vertical printing boxes; if the content of the box does not fit on a single line, then every breaking point forces a newline and an extra indentation of the number of spaces given after the ``\verb=[='' is applied at the beginning of each newline (3 spaces in the example) \item well-bracketed pairs of tokens of the form \verb='[v '= and \verb=']'= are translated into vertical printing boxes; every breaking point forces a newline, even if the line is large enough to display the whole content of the box, and an extra indentation of the number of spaces given after the ``\verb=[='' is applied at the beginning of each newline \end{itemize} Thus, for the previous example, we get %\footnote{The ``@'' is here to shunt %the notation "'IF' A 'then' B 'else' C" which is defined in {\Coq} %initial state}: Notations do not survive the end of sections. No typing of the denoted expression is performed at definition time. Type-checking is done only at the time of use of the notation. \begin{coq_example} Check (IF_then_else (IF_then_else True False True) (IF_then_else True False True) (IF_then_else True False True)). \end{coq_example} \Rem Sometimes, a notation is expected only for the parser. %(e.g. because %the underlying parser of {\Coq}, namely {\camlpppp}, is LL1 and some extra %rules are needed to circumvent the absence of factorisation). To do so, the option {\em only parsing} is allowed in the list of modifiers of \texttt{Notation}. \subsection{The \texttt{Infix} command \comindex{Infix}} The \texttt{Infix} command is a shortening for declaring notations of infix symbols. Its syntax is \begin{quote} \noindent\texttt{Infix "{\symbolentry}" :=} {\qualid} {\tt (} \nelist{\em modifier}{,} {\tt )}. \end{quote} and it is equivalent to \begin{quote} \noindent\texttt{Notation "x {\symbolentry} y" := ({\qualid} x y) (} \nelist{\em modifier}{,} {\tt )}. \end{quote} where {\tt x} and {\tt y} are fresh names distinct from {\qualid}. Here is an example. \begin{coq_example*} Infix "/\" := and (at level 80, right associativity). \end{coq_example*} \subsection{Reserving notations \label{ReservedNotation} \comindex{ReservedNotation}} A given notation may be used in different contexts. {\Coq} expects all uses of the notation to be defined at the same precedence and with the same associativity. To avoid giving the precedence and associativity every time, it is possible to declare a parsing rule in advance without giving its interpretation. Here is an example from the initial state of {\Coq}. \begin{coq_example} Reserved Notation "x = y" (at level 70, no associativity). \end{coq_example} Reserving a notation is also useful for simultaneously defined an inductive type or a recursive constant and a notation for it. \Rem The notations mentioned on Figure~\ref{init-notations} are reserved. Hence their precedence and associativity cannot be changed. \subsection{Simultaneous definition of terms and notations \comindex{Fixpoint {\ldots} where {\ldots}} \comindex{CoFixpoint {\ldots} where {\ldots}} \comindex{Inductive {\ldots} where {\ldots}}} Thanks to reserved notations, the inductive, coinductive, recursive and corecursive definitions can benefit of customized notations. To do this, insert a {\tt where} notation clause after the definition of the (co)inductive type or (co)recursive term (or after the definition of each of them in case of mutual definitions). The exact syntax is given on Figure \ref{notation-syntax}. Here are examples: \begin{coq_eval} Set Printing Depth 50. (********** The following is correct but produces an error **********) (********** because the symbol /\ is already bound **********) (**** Error: The conclusion of A -> B -> A /\ B is not valid *****) \end{coq_eval} \begin{coq_example*} Inductive and (A B:Prop) : Prop := conj : A -> B -> A /\ B where "A /\ B" := (and A B). \end{coq_example*} \begin{coq_eval} Set Printing Depth 50. (********** The following is correct but produces an error **********) (********** because the symbol + is already bound **********) (**** Error: no recursive definition *****) \end{coq_eval} \begin{coq_example*} Fixpoint plus (n m:nat) {struct n} : nat := match n with | O => m | S p => S (p+m) end where "n + m" := (plus n m). \end{coq_example*} \subsection{Displaying informations about notations \comindex{Set Printing Notation} \comindex{Unset Printing Notation}} To deactivate the printing of all notations, use the command \begin{quote} \tt Unset Printing Notation. \end{quote} To reactivate it, use the command \begin{quote} \tt Set Printing Notation. \end{quote} The default is to use notations for printing terms wherever possible. \subsection{Locating notations \comindex{Locate} \label{LocateSymbol}} To know to which notations a given symbol belongs to, use the command \begin{quote} \tt Locate {\symbolentry} \end{quote} where symbol is any (composite) symbol surrounded by quotes. To locate a particular notation, use a string where the variables of the notation are replaced by ``\_''. \Example \begin{coq_example} Locate "exists". Locate "'exists' _ , _". \end{coq_example} \SeeAlso Section \ref{Locate}. \begin{figure} \begin{centerframe} \begin{tabular}{lcl} {\sentence} & ::= & \texttt{Notation} \zeroone{\tt Local} {\str} \texttt{:=} {\term} \zeroone{\modifiers} \zeroone{:{\scope}} .\\ & $|$ & \texttt{Infix} \zeroone{\tt Local} {\str} \texttt{:=} {\qualid} \zeroone{\modifiers} \zeroone{:{\scope}} .\\ & $|$ & \texttt{Reserved Notation} \zeroone{\tt Local} {\str} \zeroone{\modifiers} .\\ & $|$ & {\tt Inductive} \nelist{{\inductivebody} \zeroone{\declnotation}}{with}{\tt .}\\ & $|$ & {\tt CoInductive} \nelist{{\inductivebody} \zeroone{\declnotation}}{with}{\tt .}\\ & $|$ & {\tt Fixpoint} \nelist{{\fixpointbody} \zeroone{\declnotation}}{with} {\tt .} \\ & $|$ & {\tt CoFixpoint} \nelist{{\cofixpointbody} \zeroone{\declnotation}}{with} {\tt .} \\ \\ {\declnotation} & ::= & \zeroone{{\tt where} {\str} {\tt :=} {\term} \zeroone{:{\scope}}} . \\ \\ {\modifiers} & ::= & \nelist{\ident}{,} {\tt at level} {\naturalnumber} \\ & $|$ & \nelist{\ident}{,} {\tt at next level} \\ & $|$ & {\tt at level} {\naturalnumber} \\ & $|$ & {\tt left associativity} \\ & $|$ & {\tt right associativity} \\ & $|$ & {\tt no associativity} \\ & $|$ & {\ident} {\tt ident} \\ & $|$ & {\ident} {\tt global} \\ & $|$ & {\ident} {\tt bigint} \\ & $|$ & {\tt only parsing} \\ & $|$ & {\tt format} {\str} \end{tabular} \end{centerframe} \caption{Syntax of the variants of {\tt Notation}} \label{notation-syntax} \end{figure} \subsection{Notations with recursive patterns} An experimental mechanism is provided for declaring elementary notations including recursive patterns. The basic syntax is \begin{coq_eval} Require Import List. \end{coq_eval} \begin{coq_example*} Notation "[ x ; .. ; y ]" := (cons x .. (cons y nil) ..). \end{coq_example*} On the right-hand-side, an extra construction of the form {\tt ..} ($f$ $t_1$ $\ldots$ $t_n$) {\tt ..} can be used. Notice that {\tt ..} is part of the {\Coq} syntax while $\ldots$ is just a meta-notation of this manual to denote a sequence of terms of arbitrary size. This extra construction enclosed within {\tt ..}, let's call it $t$, must be one of the argument of an applicative term of the form {\tt ($f$ $u_1$ $\ldots$ $u_n$)}. The sequences $t_1$ $\ldots$ $t_n$ and $u_1$ $\ldots$ $u_n$ must coincide everywhere but in two places. In one place, say the terms of indice $i$, we must have $u_i = t$. In the other place, say the terms of indice $j$, both $u_j$ and $t_j$ must be variables, say $x$ and $y$ which are bound by the notation string on the left-hand-side of the declaration. The variables $x$ and $y$ in the string must occur in a substring of the form "$x$ $s$ {\tt ..} $s$ $y$" where {\tt ..} is part of the syntax and $s$ is two times the same sequence of terminal symbols (i.e. symbols which are not variables). These invariants must be satisfied in order the notation to be correct. The term $t_i$ is the {\em terminating} expression of the notation and the pattern {\tt ($f$ $u_1$ $\ldots$ $u_{i-1}$ {\rm [I]} $u_{i+1}$ $\ldots$ $u_{j-1}$ {\rm [E]} $u_{j+1}$ $\ldots$ $u_{n}$)} is the {\em iterating pattern}. The hole [I] is the {\em iterative} place and the hole [E] is the {\em enumerating} place. Remark that if $j