1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
|
\chapter{Syntax extensions}
\label{Addoc-syntax}
In this chapter, we introduce advanced commands to modify the way
{\Coq} parses and prints objects, i.e. the translations between the
concrete and internal representations of terms and commands. The main
commands are {\tt Notation} and {\tt Infix}.
%It also happens that the same symbolic notation is expected in
%different contexts. To achieve this form of overloading, Coq offers a
%notion of interpretation scope described in section \ref{scopes}.
\Rem: Commands {\tt Grammar}, {\tt Syntax}, {\tt Syntactic Definition}
and {\tt Distfix} which were present for a while in Coq are no longer
available from Coq version 8. The underlying AST structure is also no
longer available.
\section{Basic notations}
\label{Notation}
\comindex{Notation}
A {\em notation} is a symbolic abbreviation denoting some term
or term pattern.
A typical notation is the use of the infix symbol \verb=/\= to denote
the logical and connective (\texttt{and}). Such a notation is declared
by
\begin{coq_example*}
Notation "A /\ B" := (and A B).
\end{coq_example*}
The expression \texttt{(and A B)} is the abbreviated term and the
string \verb="A /\ B"= (called a {\em notation}) tells how it is
symbolically written.
A notation is always surrounded by quotes (excepted when the
abbreviation is a single ident, see \ref{Abbreviations}). The
notation is composed of {\em tokens} separated by spaces. Identifiers
in the string (such as \texttt{A} and \texttt{B}) are the {\em
parameters} of the notation. They must occur at least once each in the
denoted term. The other elements of the string (such as \verb=/\=) are
the {\em symbols}.
An identifier can be used as a symbol but it must be surrounded by
simple quotes to avoid the confusion with a parameter. Here is an example.
\begin{coq_example*}
Notation "'IF' c1 'then' c2 'else' c3" := (IF c1 c2 c3).
\end{coq_example*}
%TODO quote the identifier when not in front, not a keyword, as in "x 'U' y" ?
\section{Precedences and associativity}
\index{Precedences}
\index{Associativity}
Mixing different symbolic notations in a same text may cause serious
parsing ambiguity. To deal with the ambiguity of notations, Coq uses
precedence levels ranging from 0 to 100 and associativity rules.
Consider for example the new notation
\begin{coq_example*}
Notation "A \/ B" := (or A B).
\end{coq_example*}
Clearly, an expression such as {\tt (A:Prop)True \verb=/\= A
\verb=\/= A \verb=\/= False} is ambiguous. To tell the Coq parser how
to interpret the expression, a priority between the symbols \verb=/\=
and \verb=\/=, and an associativity for each of the symbols has to be
given. The convention is that conjunction binds more than
disjunction. This is expressed in Coq by assigning a precedence level
to each notation, knowing that a lower level binds more than a higher
level. Hence the level for disjunction must be higher than the level
for conjunction.
Since connectives are the less tight articulation points of a text, it
is reasonable to choose levels not so far from the higher level which
is 100, for example 90 for disjunction and 80 for
conjunction\footnote{which are the levels effectively chosen in the
current implementation of Coq}.
An associativity is also needed to tells if {\tt True \verb=/\=
False \verb=/\= False} defaults to {\tt True \verb=/\= (False
\verb=/\= False)} (this is right associativity) or to {\tt (True
\verb=/\= False) \verb=/\= False} (this is left associativity). We may
even consider that the expression is not well-formed and that
parentheses are mandatory (this is a ``no associativity'')\footnote{
Coq accepts notations declared as no associative but the parser on
which Coq is built, namely Camlp4, currently does not implement the
no-associativity and replace it by a right associativity; hence it is
the same for Coq: no-associativity is in fact right associativity}.
We don't know of a special convention of the associativity of
disjunction and conjunction, let's apply for instance a right
associativity (which is the choice of Coq).
Precedence levels and associativity rules of notations have to be
given between parentheses in a list of modifiers that the
\texttt{Notation} command understands. Here is how the previous
examples refine.
\begin{coq_example*}
Notation "A /\ B" (and A B) (at level 6, right associativity).
Notation "A \/ B" (or A B) (at level 7, right associativity).
\end{coq_example*}
By default, the notation is considered non associative and its
precedence level is 1. The list of level already assigned
is on figure~\ref{coq-symbols}.
\begin{figure}
\label{coq-symbols}
\begin{tabular}{|l|l}
notation & precedence/associativity \\
\hline
\verb$"_ <-> _"$ & 100 \\
\verb$"_ /\ _"$ & 80\, R \\
\verb$"_ + _"$ & 80\, R \\
\verb$"_ \/ _"$ & 70\, R \\
\verb$"_ * _"$ & 70\, R \\
\verb$"~ _"$ & 60\, R \\
\verb$"_ = _"$ & 50 \\
\verb$"_ == _"$ & 50 \\
\verb$"_ === _"$ & 50 \\
\verb$"_ <= _"$ & 50 \\
\verb$"_ < _"$ & 50 \\
\verb$"_ > _"$ & 50 \\
\verb$"_ >= _"$ & 50 \\
\verb$"_ + _"$ & 40\, L \\
\verb$"_ - _"$ & 40\, L \\
\verb$"_ * _"$ & 30\, L \\
\verb$"_ / _"$ & 30\, L \\
\verb$"{ _ } + { _ }"$ & 0 \\
\verb$"_ + { _ }"$ & 0 \\
\verb$"{ _ : _ | _ }"$ & 0 \\
\verb$"{ _ : _ | _ & _ }"$ & 0 \\
\verb$"{ _ : _ & _ }"$ & 0 \\
\verb$"{ _ : _ & _ & _ }"$ & 0 \\
\hline
\end{tabular}
\end{figure}
\section{Complex notations}
Notation can be made from arbitraly complex symbols. One can for
instance define prefix notations.
\begin{coq_example*}
Notation "~ x" := (not x) (at level 5, right associativity).
\end{coq_example*}
One can also define notations for incomplete terms, with the hole
expected to be inferred at typing time.
\begin{coq_example*}
Notation "x = y" := (eq ? x y) (at level 5, no associativity).
\end{coq_example*}
One can define notations with parameters surrounded by symbols. In
this case, the default precedence level for inner subexpression is 10.
\begin{coq_example*}
Notation "{ A } + { B }" := (sumbool A B) (at level 1).
\end{coq_example*}
One can also define notations for binders.
\begin{coq_example*}
Notation "{ x : A | P }" := (sig A [x:A]P) (at level 1).
\end{coq_example*}
\section{Simple factorisation rules}
Coq extensible parsing is performed by Camlp4 which is essentially a
LL1 parser. Hence, some care has to be made not to hide already
existing rules by new rules. Some simple left factorisation work has
to be done. Here is an example.
\begin{coq_example*}
Notation "'EX' x | p" := (ex ? [x]p) (at level 10).
Notation "'EX' x | p & q" := (ex2 ? [x]p [x]q) (at level 10).
\end{coq_example*}
In order to factorise the left part of the rules, the subexpression
referred by {\tt p} has to be at the same level in both rules. However
the default behaviour puts {\tt p} at the next existing level below 10
in the first rule (no associativity is the default), and at the level
10 in the second rule (level 10 is the default when not on the border
of a notation). To fix this, we need to force the parsing level of p,
as follows.
\begin{coq_example*}
Notation "'EX' x | p" := (ex ? [x]p) (at level 10, p at level 9).
Notation "'EX' x | p & q" := (ex2 ? [x]p [x]q)
(at level 10, p at level 9).
\end{coq_example*}
Actually, for a better homogeneity between p and q, it is reasonable
to put q at the same level as p, hence the final version of the rules
\begin{coq_example*}
Notation "'EX' x | p" := (ex ? [x]p) (at level 10, p at level 9).
Notation "'EX' x | p & q" := (ex2 ? [x]p [x]q)
(at level 10, p, q at level 9).
\end{coq_example*}
For the sake of factorisation with Coq predefined rules, simple rules
have to be observed for notations starting with a symbol: e.g. rules
starting with ``\{'' or ``<'' should be put at level 1 and rules
starting with ``('' at level 0. The list of Coq predefined notations
are given on figure~\ref{coq-symbols}.
\section{Abbreviations}
\label{Abbreviations}
An {\em abbreviation} is a name denoting a (presumably) more complex
expression. An abbreviation is a special form of notation with no
parameter and only one symbol which is an identifier. This identifier
is given with no quotes around. Example:
\begin{coq_example*}
Notation All := (all ?).
\end{coq_example*}
An abbreviation expects also no modifiers, since such a notation can
always be put at the lower level of atomic expressions, and
associativity is irrelevant.
Abbreviations are bound to an absolute name like for an ordinary
definition, and can be referred by partially qualified names too.
Abbreviations do not survive the end of sections. No typing of the denoted
expression is performed at definition time. Type-checking is done only
at the time of use of the abbreviation.
\section{Displaying symbolic notations}
The command \texttt{Notation} has an effect both on the Coq parser and
on the Coq printer. For example:
\begin{coq_example}
Check (and True True).
\end{coq_example}
However, printing, especially when we expect it to be pretty, requires
more care than parsing. We may want well-chosen indentation,
line-break, alignment if on several lines, etc.
Printing notation is very rudimentary. When printed, each notation
opens a {\em formatting box} in such a way that if a line-break is
needed to print a large expression, all components of the notation are
at the same indentation of the different lines required for the
printing. Line breaks are done by default just before the symbols of
the notation.
The only control a user has on the printing of a notation is on the
insertion of spaces at some places of the notation. This is performed
by adding extra spaces between the symbols and parameters: each extra
space (more that the single spaces needed to split the components) is
interpreted as a space to be inserted by the printer. Here is an
example showing how to add spaces around the bar of the notation.
\begin{coq_example}
Notation "{ x : A | P }" := (sig A [x:A]P)
(at level 1, x at level 10).
Check (sig nat [x]x=x).
\end{coq_example}
\Rem
Sometimes, a notation is expected only for the parser (e.g. because
the underlying parser of Coq, namely Camlp4, is LR1 and some extra
rules are needed to circumvent the absence of factorisation).
To do so, the option {\em only parsing} is allowed in the list of modifiers of
\texttt{Notation}.
\section{Summary of the options of \texttt{Notation}}
The different syntactic variants of the command \texttt{Notation} are
the following
\begin{itemize}
\item \texttt{Notation} {\str} \texttt{:=} {\term}.
\item \texttt{Notation} {\ident} \texttt{:=} {\term}.
\item \texttt{Notation} {\str} \texttt{:=} {\term} {\tt (} \nelist{\em modifier}{,} {\tt )}.
\end{itemize}
where a {\em modifier} is one of \texttt{at level $n$},
\texttt{ no associativity}, \texttt{ right
associativity}, \texttt{ left associativity}, \texttt{ only parsing} or
\texttt{ \nelist{\ident}{,} at level $n$, }
Notations do not survive the end of sections. No typing of the denoted
expression is performed at definition time. Type-checking is done only
at the time of use of the notation.
\section{The \texttt{Infix} command}
The \texttt{Infix} command is a shortening for declaring notations of
infix symbols. Its syntax is
\medskip
\noindent\texttt{Infix} "{\em symbol}" {\qualid} {\tt (} \nelist{\em modifier}{,} {\tt )}.
\medskip
and it is equivalent to
\medskip
\noindent\texttt{Notation "x {\em symbol} y" := {\qualid} x y ( \nelist{\em modifier}{,} )}.
\medskip
where {\tt x} and {\tt y} are fresh names distinct from {\qualid}. Here is an example.
\begin{coq_example*}
Infix "/\\" and (at level 6, right associativity).
\end{coq_example*}
%\section{Symbolic interpretation scopes}
%
%TODO
\iffalse
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{syntax extensions}
\label{Addoc-syntax}
In this chapter, we introduce advanced commands to modify the way
{\Coq} parses and prints objects, i.e. the translations between the
concrete and internal representations of terms and commands. As in
most compilers, there is an intermediate structure called {\em Abstract
Syntax Tree} (AST). Parsing a term is done in two steps\footnote{
We omit the lexing step, which simply translates a character stream
into a token stream. If this translation fails, this is a
\emph{Lexical error}.
}:
\begin{enumerate}
\item An AST is build from the input (a stream of tokens), following
grammar rules. This step consists in deciding whether the input
belongs to the language or not. If it is, the parser transforms the
expression into an AST. If not, this is a syntax error. An expression
belongs to the language if there exists a sequence of grammar rules
that recognizes it. This task is delegated to {\camlpppp}. See the
Reference Manual~\cite{ddr98} for details on the parsing
technology. The transformation to AST is performed by executing
successively the {\sl actions} bound to these rules.
\item The AST is translated into the internal representation of
commands and terms. At this point, we detect unbound variables and
determine the exact section-path of every global value. Then, the term
may be typed, computed,~\ldots
\end{enumerate}
The printing process is the reverse: commands or terms are first
translated into AST's, and then the pretty-printer translates this AST
into a printing orders stream, according to printing rules.
In {\Coq}, only the translations between AST's and the concrete
representation are extendable. One can modify the set of grammar and
printing rules, but one cannot change the way AST's are interpreted in
the internal level.
In the following section, we describe the syntax of AST expressions,
involved in both parsing and printing. The next two sections deal with
extendable grammars and pretty-printers.
\section{Abstract syntax trees (AST)}
\index{AST}
\index{Abstract syntax tree}
\label{astsyntax}
The AST expressions are conceptually divided into two classes:
\emph{constructive expressions} (those that can be used in parsing
rules) and \emph{destructive expressions} (those that can be used in
pretty printing rules). In the following we give the concrete syntax
of expressions and some examples of their usage.
The BNF grammar {\sl ast} in Fig.~\ref{astexpr} defines the syntax
of both constructive and destructive expressions. The lexical
conventions are the same as in section~\ref{lexical}. Let us first
describe the features common to constructive and destructive
expressions.
\begin{figure}
\begin{center}
\begin{tabular}{|rclr|}
\hline
{\sl ast} & ::= & {\sl meta} & (metavariable) \\
& $|$ & {\ident} & (variable)\\
& $|$ & {\integer} & (positive integer)\\
& $|$ & {\sl string} & (quoted string) \\
& $|$ & {\sl path} & (section-path) \\
& $|$ & \verb+{+ {\ident} \verb+}+ & (identifier) \\
& $|$ & \verb+[+ {\sl name} \verb+]+ {\sl ast} & (abstraction) \\
& $|$ & \verb+(+ {\ident}~~\sequence{{\sl ast}}{} \verb+)+
& (application node)\\
& $|$ & \verb+(+ {\sl special-tok}~~{\sl meta} \verb+)+
& (special-operator) \\
& $|$ & \verb+'+ {\sl ast} & (quoted ast) \\
& $|$ & {\sl quotation} & \\
{\sl meta} & ::= & \verb+$+{\ident} & \\
%$
{\sl path} & ::= & \nelist{{\tt\#}{\ident}}{} \verb+.+ {\sl kind} & \\
{\sl kind} & ::= & \verb+cci+ $~|~$\verb+fw+ $~|~$ \verb+obj+ & \\
{\sl name} & ::= & \verb+<>+ ~$|$~ {\ident} ~$\mid$~ {\sl meta} & \\
{\sl special-tok} & ::= &
\verb+$LIST+~$|$~\verb+$VAR+~$|$~\verb+$NUM+~$|$~%
\verb+$STR+~$|$~\verb+$PATH+~$|$~\verb+$ID+ & \\
{\sl quotation}& ::= &
\verb+<<+ ~{\sl meta-constr}~ \verb+>>+ & \\
& $|$ & \verb+<:constr:<+ ~{\sl meta-constr}~ \verb+>>+ & \\
& $|$ & \verb+<:vernac:<+ ~{\sl meta-vernac}~ \verb+>>+ & \\
& $|$ & \verb+<:tactic:<+ ~{\sl meta-tactic}~ \verb+>>+ & \\
\hline
\end{tabular}
\end{center}
\caption{Syntax of AST expressions}\label{astexpr}
\end{figure}
\subsubsection{Atomic AST}
An atomic AST can be either a variable, a natural number, a quoted
string, a section path or an identifier. They are the basic components
of an AST.
\subsubsection{Metavariable}
\index{Metavariable}
Metavariables are used to perform substitutions in constructive
expressions: they are replaced by their value in a given
environment. They are also involved in the pattern matching operation:
metavariables in destructive patterns create new bindings in the
environment.
As we will see later, metavariables may denote an AST or an AST list
(when used with the \verb+$LIST+ special token).
%$
So, we introduce two types of variables: \verb+ast+ and
\verb+ast list+. The type of variables is checked statically: an
expression referring to undefined metavariables, or using metavariables
with an inappropriate type, will be rejected.
\subsubsection{Application node}
Note that the AST syntax is rather general, since application nodes
may be labelled by an arbitrary identifier (but not a metavariable),
and operators have no fixed arity. This enables the extensibility of
the system.
Nevertheless there are some application nodes that have some special
meaning for the system. They are build on \verb+(+{\sl
special-tok}~{\sl meta}\verb+)+, and cannot be confused with regular
nodes since {\sl special-tok} begins with a \verb+$+.
%$
There is a description of these nodes below.
\subsubsection{Abstraction}
The equality on AST's is the $\alpha$-conversion, i.e. two AST's are
equal if only they are the same up to renaming of bound variables
(thus, \verb+[x]x+ is equal to \verb+[y]y+). This makes the difference
between variables and identifiers clear: the former may be bound by
abstractions, whereas identifiers cannot be bound. To illustrate this,
\verb+[x]x+ and \verb+[y]y+ are equal and \verb+[x]{x}+ is equal to
\verb+[y]{x}+, but not to \verb+[y]{y}+.
The binding structure of AST is used to represent the binders in the
terms of {\Coq}: the product \verb+(x:$A)$B+ is mapped to the AST
\verb+(PROD $A [x]$B)+, whereas the non dependent product
\verb+$A->$B+ is mapped to \verb+(PROD $A [<>]$B)+ (\verb+[<>]t+ is an
anonymous abstraction).
Metavariables can appear in abstractions. In that case, the value of
the metavariable must be a variable (or a list of variables). If not,
a run-time error is raised.
\subsubsection{Quoted AST}
\index{Quoted AST}
The \verb+'+$t$ construction means that the AST $t$ should not be
interpreted at all. The main effect is that metavariables occurring in
it cannot be substituted or considered as binding in patterns.
\subsubsection{Quotations}
\index{Quotations}
The non terminal symbols {\sl meta-constr}, {\sl meta-vernac} and
{\sl meta-tactic} stand, respectively, for the syntax of CIC terms,
vernacular phrases and tactics. The prefix {\sl meta-} is just to
emphasize that the expression may refer to metavariables.
Indeed, if the AST to generate corresponds to a term that already has
a syntax, one can call a grammar to parse it and to return the AST
result. For instance, \verb+<<(eq ? $f $g)>>+ denotes the AST which is
the application (in the sense of CIC) of the constant {\tt eq} to
three arguments. It is coded as an AST node labelled {\tt APPLIST}
with four arguments.
This term is parsable by \verb+constr:constr+ grammar. This grammar
is invoked on this term to generate an AST by putting the term between
``\verb+<<+'' and ``\verb+>>+''.
We can also invoke the initial grammars of several other predefined
entries (see section~\ref{predefined-grammars} for a description of
these grammars).
\begin{itemize}
\item \verb|<:constr:< t >>| parses {\tt t} with {\tt constr:constr}
grammar(terms of CIC).
\item \verb|<:vernac:< t >>| parses {\tt t} with {\tt vernac:vernac}
grammar (vernacular commands).
\item \verb|<:tactic:< t >>| parses {\tt t} with {\tt tactic:tactic}
grammar (tactic expressions).
\item \verb|<< t >>| parses {\tt t} with the default quotation (that
is, {\tt constr:constr}). It is the same as \verb|<:constr:< t >>|.
\end{itemize}
\Warning
One cannot invoke other grammars than those described.
\subsubsection{Special operators in constructive expressions}
The expressions \verb+($LIST $x)+ injects the AST list variable
{\tt\$x} in an AST position. For example, an application node is
composed of an identifier followed by a list of AST's that are glued
together. Each of these expressions must denote an AST. If we want to
insert an AST list, one has to use the \verb+$LIST+ operator. Assume
the variable \verb+$idl+ is bound to the list \verb+[x y z]+, the
expression \verb+(Intros ($LIST $idl) a b c)+ will build the AST
\verb+(Intros x y z a b c)+. Note that \verb+$LIST+ does not occur in
the result.
%$
% Ameliorer le style!
Since we know the type of variables, the \verb+$LIST+
%$
is not really
necessary. We enforce this annotation to stress on the fact that the
variable will be substituted by an arbitrary number of AST's.
The other special operators ({\tt\$VAR}, {\tt\$NUM}, {\tt\$STR},
{\tt\$PATH} and {\tt\$ID}) are forbidden.
\subsubsection{Special operators in destructive expressions (AST patterns)}
\label{patternsyntax}
\index{AST patterns}
% pas encore implante (serait utile pour afficher les LAMBDALIST et
% PRODLIST).
%\item \verb+($SLAM $l body)+ is used to denote an abstraction where
% the elements of the list variable \verb+$l+ are the variables
%%$
% simultaneously abstracted in \verb+body+.
% The production to recognize a lambda abstraction of the form
% $[x_1,\ldots,x_n:T]body$ use the operator \verb+$SLAM+:
A pattern is an AST expression, in which some metavariables can
appear. In a given environment a pattern matches any AST which is
equal (w.r.t $\alpha$-conversion) to the value of the pattern in an
extension of the current environment. The result of the matching is
precisely this extended environment. This definition allows
non-linear patterns (i.e. patterns in which a variable occurs several
times).
For instance, the pattern \verb+(PAIR $x $x)+ matches any AST which is
a node labelled {\tt PAIR} applied to two identical arguments, and
binds this argument to {\tt\$x}. If {\tt\$x} was already bound, the
arguments must also be equal to the current value of {\tt\$x}.
The ``wildcard pattern'' \verb+$_+ is not a regular metavariable: it
matches any term, but does not bind any variable. The pattern
\verb+(PAIR $_ $_)+ matches any {\tt PAIR} node applied to two
arguments.
The {\tt\$LIST} operator still introduces list variables. Typically,
when a metavariable appears as argument of an application, one has to
say if it must match \emph{one} argument (binding an AST variable), or
\emph{all} the arguments (binding a list variable). Let us consider
the patterns \verb+(Intros $id)+ and \verb+(Intros ($LIST $idl))+. The
former matches nodes with \emph{exactly} one argument, which is bound
in the AST variable {\tt\$id}. On the other hand, the latter pattern
matches any AST node labelled {\tt Intros}, and it binds the
\emph{list} of its arguments to the list variable {\tt\$idl}. The
{\tt\$LIST} pattern must be the last item of a list pattern, because
it would make the pattern matching operation more complicated and less
efficient. The pattern \verb+(Intros ($LIST $idl) $lastid)+ is not
accepted.
The other special operators allows checking what kind of leaf we
are destructing:
\begin{itemize}
\item{\tt\$VAR} matches only variables
\item{\tt\$NUM} matches natural numbers
\item{\tt\$STR} matches quoted strings
\item{\tt\$PATH} matches section-paths
\item{\tt\$ID} matches identifiers
\end{itemize}
\noindent For instance, the pattern \verb+(DO ($NUM $n) $tc)+ matches
\verb+(DO 5 (Intro))+, and creates the bindings ({\tt\$n},5) and
({\tt\$tc},\verb+(Intro)+). The pattern matching would fail on
\verb+(DO "5" (Intro))+.
\section{Extendable grammars}
\label{Grammar}
\index{Extendable Grammars}
\comindex{Grammar}
Grammar rules can be added with the {\tt Grammar} command. This
command is just an interface towards {\camlpppp}, providing the
semantic actions so that they build the expected AST. A simple grammar
command has the following syntax: \index{Grammar@{\tt Grammar}}
\begin{center}
\texttt{Grammar}~\textsl{entry}~\textsl{nonterminal}~\texttt{:=}~%
\textsl{rule-name}~\textsl{LMP}~\verb+->+~\textsl{action}~\texttt{.}
\end{center}
The components have the following meaning:
\begin{itemize}
\item a grammar name: defined by a parser entry and a non-terminal.
Non-terminals are packed in an \emph{entry} (also called
universe). One can have two non-terminals of the same name if they
are in different entries. A non-terminal can have the same name as
its entry.
\item a rule (sometimes called production), formed by a name, a left
member of production and an action, which generalizes constructive
expressions.
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}
\begin{center}
\begin{tabular}{|rcl|}
\hline
{\sl grammar} & ::= &
\verb+Grammar+~{\sl entry}~\nelist{{\sl gram-entry}}{with} \\
{\sl entry}& ::= & {\ident} \\
{\sl gram-entry} & ::= &
{\sl rule-name}~\zeroone{{\tt :}~{\sl entry-type}}~\verb+:=+~%
% [{\sl assoc}]~
\sequence{{\sl production}}{|} \\
{\sl entry-type} & ::= &
\verb+ast+~$|$~\verb+ast list+~$|$~\verb+constr+
~$|$~\verb+tactic+~$|$~\verb+vernac+ \\
%{\sl assoc} & ::= & \verb+LEFTA+~$|$~\verb+RIGHTA+~$|$~\verb+NONA+ \\
% parler de l'associativite ?
% Cela n'a pas vraiment de sens si toutes les re`gles sont dans un
% meme niveau pour Camlp4.
{\sl production} & ::= &
{\sl rule-name}~\verb+[+~\sequence{{\sl prod-item}}{}~\verb+] ->+
~{\sl action}\\
{\sl rule-name} & ::= & {\sl ident} \\
{\sl prod-item} & ::= & {\sl string} \\
& | & \zeroone{{\sl entry}~{\tt :}}~{\sl entry-name}~%
\zeroone{{\tt (}~{\sl meta}~{\tt )}} \\
{\sl action} & ::= &
\verb+[+~\sequence{{\sl ast-quote}}{}~\verb+]+ \\
& | & \verb+let+~{\sl pattern}~\verb+=+~{\sl action}~%
\verb+in+~{\sl action} \\
& | & {\tt case}~{\sl action}~\zeroone{{\tt :}~{\sl entry-type}}~{\tt of}~%
\sequence{{\sl case}}{|}~{\tt esac} \\
{\sl case} & ::= & \sequence{{\sl pattern}}{}~\verb+->+~{\sl action} \\
{\sl pattern} & ::= & {\sl ast} \\
\hline
\end{tabular}
\end{center}
\caption{Syntax of the grammar extension command}\label{grammarsyntax}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The exact syntax of the {\tt Grammar} command is defined
in Fig.~\ref{grammarsyntax} where non terminal {\sl ast-quote} is one
of {\tt ast}, {\tt constr}, {\tt tactic} or {\tt vernac}, depending on
the entry type.
It is possible to extend a grammar with several rules at once.
$$
\begin{array}{rcc}
\verb+Grammar+~~entry~~nonterminal
& \verb+:=+ & production_1 \\
& \verb+|+ & \vdots \\
& \verb+|+ & production_n \verb+.+ \\
\end{array}
$$
Productions are entered in reverse order (i.e. $production_n$ before
$production_1$), so that the first rules have priority over the last
ones. The set of rules can be read as an usual pattern matching.
\noindent Also, we can extend several grammars of a given universe at
the same time. The order of non-terminals does not matter since they
extend different grammars.
$$
\begin{array}{rcccc}
\verb+Grammar+ & entry & nonterminal_1
& \verb+:=+ & production_1^1 \\
&& & \verb+|+ & \vdots \\
&& & \verb+|+ & production_{n_1}^1 \\
& \verb+with+ & \vdots && \\
& \verb+with+ & nonterminal_p
& \verb+:=+ & production_1^p \\
&& & \verb+|+ & \vdots \\
&& & \verb+|+ & production_{n_p}^p \verb+.+ \\
\end{array}
$$
\subsection{Grammar entries}
\index{Grammar entries}
\label{predefined-grammars}
Let us describe the four predefined entries. Each of them (except {\tt
prim}) possesses an initial grammar for starting the parsing process.
\begin{itemize}
\item \verb+prim+ : it is the entry of the primitive grammars. Most of
them cannot be defined by the extendable grammar mechanism. They are
encoded inside the system. The entry contains the following
non-terminals:
\begin{itemize}
\item \verb+var+ : variable grammar. Parses an identifier and builds
an AST which is a variable.
\item \verb+ident+ : identifier grammar. Parses an identifier and
builds an AST which is an identifier such as \verb+{x}+.
\item \verb+number+ : number grammar. Parses a positive integer.
\item \verb+string+ : string grammar. Parses a quoted string.
\item \verb+path+ : section path grammar.
\item \verb+ast+ : AST grammar.
\item \verb+astpat+ : AST pattern grammar.
\item \verb+astact+ : action grammar.
\end{itemize}
The primitive grammars are used as the other grammars; for instance
the variables of terms are parsed by \verb+prim:var($id)+.
\item \verb+constr+ : it is the term entry. It allows to have a pretty
syntax for terms. Its initial grammar is {\tt constr:constr}. This
entry contains several non-terminals, among them {\tt constr0} to
{\tt constr10} which stratify the terms according to priority levels
(\verb+0+ to \verb+10+). These priority levels allow us also to
specify the order of associativity of operators.
% Ce serait bien si on etait capables de donner une table avec les
% niveaux de priorite de chaque operateur...
\item \verb+vernac+ : it is the vernacular command entry, with {\tt
vernac vernac} as initial grammar. Thanks to it, the developers can
define the syntax of new commands they add to the system. As to
users, they can change the syntax of the predefined vernacular
commands.
\item \verb+tactic+ : it is the tactic entry with {\tt tactics:tactic}
as initial grammar. This entry allows to define the syntax of new
tactics or redefine the syntax of existing tactics.
% See chapter~\ref{WritingTactics} about user-defined tactics
% for more details.
\end{itemize}
The user can define new entries and new non-terminals, using the
grammar extension command. A grammar does not have to be explicitly
defined. But the grammars in the left member of rules must all be
defined, possibly by the current grammar command. It may be convenient
to define an empty grammar, just so that it may be called by other
grammars, and extend this empty grammar later. Assume that the {\tt
constr:constr13} does not exist. The next command defines it with
zero productions.
\begin{coq_example*}
Grammar constr constr13 := .
\end{coq_example*}
The grammars of new entries do not have an initial grammar. To use
them, they must be called (directly or indirectly) by grammars of
predefined entries. We give an example of a (direct) call of the
grammar {\tt newentry:nonterm} by {\tt constr:constr}. This following
rule allows to use the syntax \verb+a&b+ for the conjunction
\verb+a/\b+. Note that since we extend a rule of universe
\verb+constr+, the command quotation is used on the right-hand side of
the second rule.
\begin{coq_example}
Grammar newentry nonterm :=
ampersand [ "&" constr:constr($c) ] -> [$c].
Grammar constr constr :=
new_and [ constr8($a) newentry:nonterm($b) ] -> [$a/\$b].
\end{coq_example}
\subsection{Left member of productions (LMP)}
A LMP is composed of a combination of terminals (enclosed between
double quotes) and grammar calls specifying the entry. It is enclosed
between ``\verb+[+'' and ``\verb+]+''. The empty LMP, represented by
\verb+[ ]+, corresponds to $\epsilon$ in formal language theory.
A grammar call is done by \verb+entry:nonterminal($id)+ where:
\begin{itemize}
\item \verb+entry+ and \verb+nonterminal+
specifies the entry of the grammar, and the non-terminal.
\item \verb+$id+ is a metavariable that will receive the AST or AST
list resulting from the call to the grammar.
\end{itemize}
The elements \verb+entry+ and \verb+$id+ are optional. The grammar
entry can be omitted if it is the same as the entry of the
non-terminal we are extending. Also, \verb+$id+ is omitted if we do
not want to get back the AST result. Thus a grammar call can be
reduced to a non-terminal.
Each terminal must contain exactly one token. This token does not need
to be already defined. If not, it will be automatically
added. Nevertheless, any string cannot be a token (e.g. blanks should
not appear in tokens since parsing would then depend on
indentation). We introduce the notion of \emph{valid token}, as a
sequence, without blanks, of characters taken from the following list:
\begin{center}
\verb"< > / \ - + = ; , | ! @ # % ^ & * ( ) ? : ~ $ _ ` ' a..z A..Z 0..9"
\end{center}
that do not start with a character from
\begin{center}
\verb!$ _ a..z A..Z ' 0..9!
\end{center}
When an LMP is used in the parsing process of an expression, it is
analyzed from left to right. Every token met in the LMP should
correspond to the current token of the expression. As for the grammars
calls, they are performed in order to recognize parts of the initial
expression.
\Warning
Unlike destructive expressions, if a variable appears several times in
the LMP, the last binding hides the previous ones. Comparison can be
performed only in the actions.
\firstexample
\example{Defining a syntax for inequality}
The rule below allows us to use the syntax \verb+t1#t2+ for the term
\verb+~t1=t2+.
\begin{coq_example}
Grammar constr constr1 :=
not_eq [ constr0($a) "#" constr0($b) ] -> [ ~$a=$b ].
\end{coq_example}
The level $1$ of the grammar of terms is extended with one rule named
\texttt{not\_eq}. When this rule is selected, its LMP calls the
grammar \verb+constr:constr0+. This grammar recognizes a term that it
binds to the metavariable \verb+$a+. Then it meets the token
``\verb+#+'' and finally it calls the grammar
\verb+constr:constr0+. This grammar returns the recognized term in
\verb+$b+. The action constructs the term \verb+~$a=$b+.
For instance, let us give the statement of the symmetry of \verb+#+:
\begin{coq_example}
Goal (A:Set)(a,b:A) a#b -> b#a.
\end{coq_example}
This shows that the system understood the grammar
extension. Nonetheless, since no special printing command was given,
the goal is displayed using the usual syntax for negation and
equality. One can force \verb+~a=b+ to be printed \verb=a#b= by giving
pretty-printing rules. This is explained in section~\ref{Syntax}.
\Warning
Metavariables are identifiers preceded by the ``\verb+$+'' symbol.
They cannot be replaced by identifiers. For instance, if we enter a
rule with identifiers and not metavariables, the identifiers are
assumed to be global names (what raises a warning if no global name is
denoted by these identifiers).
% Test failure
\begin{coq_example}
Grammar constr constr1 :=
not_eq [ constr0($a) "#" constr0($b) ] -> [~(a=b)].
\end{coq_example}
\begin{coq_eval}
Abort.
\end{coq_eval}
\example{Redefining vernac commands}
Thanks to the following rule, ``{\tt |- term.}'' will have the same
effect as ``{\tt Goal term.}''.
\begin{coq_example}
Grammar vernac vernac :=
thesis [ "|" "-" constr:constr($term) "." ]
-> [Goal $term.].
\end{coq_example}
\noindent This rule allows putting blanks between the bar and the
dash, as in
\begin{coq_example}
| - (A:Prop)A->A.
\end{coq_example}
\begin{coq_eval}
Reset Initial.
\end{coq_eval}
\noindent Assuming the previous rule has not been entered, we can
forbid blanks with a rule that declares ``\verb+|-+'' as a single
token:
\begin{coq_eval}
(********** The following is not correct and should produce **********)
(*************** Syntax error: illegal begin of vernac ***************)
(* Just to adjust the prompt: *) Set Printing Depth 50.
\end{coq_eval}
\begin{coq_example}
Grammar vernac vernac :=
thesis [ "|-" constr:constr($term) "." ]
-> [Goal $term.].
| - (A:Prop)A->A.
\end{coq_example}
\noindent If both rules were entered, we would have three tokens
\verb+|+, \verb+-+ and \verb+|-+. The lexical ambiguity on the string
\verb+|-+ is solved according to the longest match rule (see lexical
conventions page~\pageref{lexical}), i.e. \verb+|-+ would be one
single token. To enforce the use of the first rule, a blank must be
inserted between the bar and the dash\footnote{It turns out that "|-"
is already a token defined for other purposes, then the first rule
cannot parse "|- (A:Prop)A->A" and indeed requires the insertion of a blank}.
\Rem
The vernac commands should always be terminated by a period. When a
syntax error is detected, the top-level discards its input until it
reaches a period token, and then resumes parsing.
\example{Redefining tactics}
We can give names to repetitive tactic sequences. Thus in this example
``{\tt IntSp}'' will correspond to the tactic {\tt Intros} followed by
{\tt Split}.
\begin{coq_example}
Grammar tactic simple_tactic :=
intros_split [ "IntSp" ] -> [Intros; Split].
\end{coq_example}
Let us check that this works.
\begin{coq_example}
Goal (A,B:Prop)A/\B -> B/\A.
IntSp.
\end{coq_example}
\begin{coq_eval}
Abort.
\end{coq_eval}
Note that the same result can be obtained in a simpler way with {\tt
Tactic Definition} (see chapter~\ref{TacticLanguage}).
\example{Priority, left and right associativity of operators}
The disjunction has a higher priority than conjunction. Thus
\verb+A/\B\/C+ will be parsed as \verb+(A/\B)\/C+ and not as
\verb+A/\(B\/C)+. The priority is done by putting the rule for the
disjunction in a higher level than that of conjunction: conjunction is
defined in the non-terminal {\tt constr6} and disjunction in {\tt
constr7} (see file {\tt Logic.v} in the library). Notice that
the character ``\verb+\+'' must be doubled (see lexical conventions
for quoted strings on page~\pageref{lexical}).
\begin{coq_example*}
Grammar constr constr6 :=
and [ constr5($c1) "/\\" constr6($c2) ] -> [(and $c1 $c2)].
Grammar constr constr7 :=
or [ constr6($c1) "\\/" constr7($c2) ] -> [(or $c1 $c2)].
\end{coq_example*}
Thus conjunction and disjunction associate to the right since in both
cases the priority of the right term (resp. {\tt constr6} and {\tt
constr7}) is higher than the priority of the left term (resp. {\tt
constr5} and {\tt constr6}). The left member of a conjunction cannot
be itself a conjunction, unless you enclose it inside parenthesis.
The left associativity is done by calling recursively the
non-terminal. {\camlpppp} deals with this recursion by first trying
the non-left-recursive rules. Here is an example taken from the
standard library, defining a syntax for the addition on integers:
\begin{coq_example*}
Grammar znatural expr :=
expr_plus [ expr($p) "+" expr($c) ] -> [(Zplus $p $c)].
\end{coq_example*}
\subsection{Actions}
\label{GramAction}
\index{Grammar Actions}
Every rule should generate an AST corresponding to the syntactic
construction that it recognizes. This generation is done by an
action. Thus every rule is associated to an action. The syntax has
been defined in Fig.~\ref{grammarsyntax}. We give some examples.
\subsubsection{Simple actions}
A simple action is an AST enclosed between ``\verb+[+'' and
``\verb+]+''. It simply builds the AST by interpreting it as a
constructive expression in the environment defined by the LMP. This
case has been illustrated in all the previous examples. We will later
see that grammars can also return AST lists.
\subsubsection{Local definitions}
When an action should generate a big term, we can use
\texttt{let}~\textsl{pattern}~\texttt{=}~\textsl{action}$_1$~%
\texttt{in}~\textsl{action}$_2$\index{let@{\tt let}} expressions to
construct it progressively. The action \textsl{action}$_1$ is first
computed, then it is matched against \textsl{pattern} which may bind
metavariables, and the result is the evaluation of \textsl{action}$_2$
in this new context.
\example{}
\noindent From the syntax \verb|t1*+t2|, we generate the term
{\tt (plus (plus t1 t2) (mult t1 t2))}.
\begin{coq_example}
Grammar constr constr1 :=
mult_plus [ constr0($a) "*" "+" constr0($b) ]
-> let $p1=[(plus $a $b)] in
let $p2=[(mult $a $b)] in
[(plus $p1 $p2)].
\end{coq_example}
Let us give an example with this syntax:
\begin{coq_example}
Goal (O*+O)=O.
\end{coq_example}
\begin{coq_eval}
Abort.
\end{coq_eval}
\subsubsection{Conditional actions}
We recall the syntax of conditional actions:
\begin{center}
\texttt{case}~\textsl{action}~\texttt{of}~%
\textsl{pattern}$_1$~\verb+->+~\textsl{action}$_1$~%
\texttt{|}~$\cdots$~\texttt{|}~%
\textsl{pattern}$_n$~\verb+->+~\textsl{action}$_n$~%
\texttt{esac}
\end{center}\index{case@@{\tt case}}
The action to execute is chosen according to the value of
\textsl{action}. The matching is performed from left to right. The
selected action is the one associated to the first pattern that
matches the value of \textsl{action}. This matching operation will
bind the metavariables appearing in the selected pattern. The pattern
matching does need being exhaustive, and no warning is emitted. When the
pattern matching fails a message reports in which grammar rule the
failure happened.
\example{Overloading the ``$+$'' operator}
The internal representation of an expression such as {\tt A+B} depends
on the shape of {\tt A} and {\tt B}:
\begin{itemize}
\item \verb/{P}+{Q}/ uses {\tt sumbool}
\item otherwise, \verb/A+{Q}/ uses {\tt sumor}
\item otherwise, \verb/A+B/ uses {\tt sum}.
\end{itemize}
The trick is to build a temporary AST: \verb/{A}/ generates the node
\verb/(SQUASH A)/. When we parse \verb/A+B/, we remove the {\tt
SQUASH} in {\tt A} and {\tt B}:
\begin{coq_example*}
Grammar constr constr1: ast :=
squash [ "{" lconstr($lc) "}" ] -> [(SQUASH $lc)].
Grammar constr lassoc_constr4 :=
squash_sum
[ lassoc_constr4($c1) "+" lassoc_constr4($c2) ] ->
case [$c2] of
(SQUASH $T2) ->
case [$c1] of
(SQUASH $T1) -> [(sumbool $T1 $T2)]
| $_ -> [(sumor $c1 $T2)]
esac
| $_ -> [(sum $c1 $c2)]
esac.
\end{coq_example*}
The first rule is casted with type ast, because the produced term
cannot be reached by the input syntax. On the other hand, the second
command has (implicit) type \verb+constr+, so the right hand side is
parsed with the term parser.
\noindent The problem is that sometimes, the intermediate {\tt SQUASH}
node cannot re-shaped, then we have a very specific error:
\begin{coq_eval}
(********** The following is not correct and should produce **********)
(************** Error: Ill-formed specification **********************)
(* Just to adjust the prompt: *) Set Printing Depth 50.
\end{coq_eval}
\begin{coq_example}
Check {True}.
\end{coq_example}
\example{Comparisons and non-linear patterns}
The patterns may be non-linear: when an already bound metavariable
appears in a pattern, the value yielded by the pattern matching must
be equal, up to renaming of bound variables, to the current
value. Note that this does not apply to the wildcard \verb+$_+. For
example, we can compare two arguments:
\begin{coq_example}
Grammar constr constr10 :=
refl_equals [ constr9($c1) "||" constr9($c2) ] ->
case [$c1] of $c2 -> [(refl_equal ? $c2)] esac.
Check ([x:nat]x || [y:nat]y).
\end{coq_example}
\noindent The metavariable \verb+$c1+ is bound to \verb+[x:nat]x+ and
\verb+$c2+ to \verb+[y:nat]y+. Since these two values are equal, the
pattern matching succeeds. It fails when the two terms are not equal:
% Test failure
\begin{coq_eval}
(********** The following is not correct and should produce **********)
(************* Error: ... Grammar case failure ... ******************)
(* Just to adjust the prompt: *) Set Printing Depth 50.
\end{coq_eval}
\begin{coq_example}
Check ([x:nat]x || [z:bool]z).
\end{coq_example}
\subsection{Grammars of type {\tt ast list}}
Assume we want to define an non-terminal {\tt ne\_identarg\_list} that
parses an non-empty list of identifiers. If the grammars could only
return AST's, we would have to define it this way:
\begin{coq_example*}
Grammar tactic my_ne_ident_list : ast :=
ident_list_cons [ identarg($id) my_ne_ident_list($l) ] ->
case [$l] of
(IDENTS ($LIST $idl)) -> [(IDENTS $id ($LIST $idl))]
esac
| ident_list_single [ identarg($id) ] -> [(IDENTS $id)].
\end{coq_example*}
But it would be inefficient: every time an identifier is read, we
remove the ``boxing'' operator {\tt IDENTS}, and put it back once the
identifier is inserted in the list.
To avoid these awkward trick, we allow grammars to return AST
lists. Hence grammars have a type ({\tt ast} or {\tt ast list}), just like
AST's do. Type-checking can be done statically.
The simple actions can produce lists by putting a list of constructive
expressions one beside the other. As usual, the {\tt\$LIST} operator
allows to inject AST list variables.
\begin{coq_example*}
Grammar tactic ne_identarg_list : ast list :=
ne_idl_cons [ identarg($id) ne_identarg_list($idl) ]
-> [ $id ($LIST $idl) ]
| ne_idl_single [ identarg($id) ] -> [ $id ].
\end{coq_example*}
%$
Note that the grammar type must be recalled in every extension
command, or else the system could not discriminate between a single
AST and an AST list with only one item. If omitted, the default type
depends on the universe name. The following command fails because the non-terminal {\tt
ne\_identarg\_list} is already defined with type {\tt ast list} but the
{\tt Grammar} command header assumes its type is {\tt ast}.
\begin{coq_eval}
(********** The following is not correct and should produce **********)
(****** Error: ne_identarg_list already exists with another type *****)
(* Just to adjust the prompt: *) Set Printing Depth 50.
\end{coq_eval}
% Test failure
\begin{coq_example}
Grammar tactic ne_identarg_list :=
list_two [ identarg($id1) identarg($id2) ] -> [ $id1 ; $id2 ].
\end{coq_example}
All rules of a same grammar must have the same type. For instance, the
following rule is refused because the \verb+constr:constr1+ grammar
has been already defined with type {\tt Ast}, and cannot be extended
with a rule returning AST lists.
% Test failure
\begin{coq_eval}
(********** The following is not correct and should produce **********)
(********* Error: ']' expected after [default_action_parser] *********)
(* Just to adjust the prompt: *) Set Printing Depth 50.
\end{coq_eval}
\begin{coq_example}
Grammar constr constr1 :=
carret_list [ constr0($c1) "^" constr0($c2)] -> [ $c1 $c2 ].
\end{coq_example}
\subsection{Limitations}
The extendable grammar mechanism have four serious limitations. The
first two are inherited from {\camlpppp}.
\begin{itemize}
\item Grammar rules are factorized syntactically: {\camlpppp} does not
try to expand non-terminals to detect further factorizations. The
user must perform the factorization himself.
\item The grammar is not checked to be \emph{LL(1)}\index{LL(1)} when
adding a rule. If it is not LL(1), the parsing may fail on an input
recognized by the grammar, because selecting the appropriate rule
may require looking several tokens ahead. {\camlpppp} always selects
the most recent rule (and all those that factorize with it)
accepting the current token.
\item There is no command to remove a grammar rule. However there is a
trick to do it. It is sufficient to execute the ``{\tt Reset}''
command on a constant defined before the rule we want to remove.
Thus we retrieve the state before the definition of the constant,
then without the grammar rule. This trick does not apply to grammar
extensions done in {\ocaml}.
\item Grammar rules defined inside a section are automatically removed
after the end of this section: they are available only inside it.
\end{itemize}
The command {\tt Print Grammar} prints the rules of a grammar. It is
displayed by {\camlpppp}. So, the actions are not printed, and the
recursive calls are printed \verb+SELF+\index{SELF@{\tt SELF}}. It is
sometimes useful if the user wants to understand why parsing fails, or
why a factorization was not done as expected.\index{Print Grammar@{\tt
Print Grammar}}
\begin{coq_example}
Print Grammar constr constr8.
\end{coq_example}
\subsubsection{Getting round the lack of factorization}
The first limitation may require a non-trivial work, and may lead to
ugly grammars, hardly extendable. Sometimes, we can use a trick to
avoid these troubles. The problem arises in the {\gallina} syntax, to
make {\camlpppp} factorize the rules for application and product. The
natural grammar would be:
% Test failure
\begin{coq_eval}
(********** The following is not correct and should produce **********)
(******** Syntax error: ')' expected after [Constr.constr10] *********)
\end{coq_eval}
\begin{coq_example}
Grammar constr constr0 : ast :=
parenthesis [ "(" constr10($c) ")" ] -> [$c]
| product [ "(" prim:var($id) ":" constr($c1) ")" constr0($c2) ] ->
[(PROD $c1 [$id]$c2)]
with constr10 : ast :=
application [ constr9($c1) ne_constr_list($lc) ] ->
[(APPLIST $c1 ($LIST $lc))]
| inject_com91 [ constr9($c) ] -> [$c].
Check (x:nat)nat.
\end{coq_example}
\begin{coq_eval}
Reset Initial.
\end{coq_eval}
But the factorization does not work, thus the product rule is never
selected since identifiers match the {\tt constr10} grammar. The
trick is to parse the ident as a {\tt constr10} and check \emph{a
posteriori} that the term is indeed an identifier:
\begin{coq_example}
Grammar constr constr0 : ast :=
product [ "(" constr10($c) ":" constr($c1) ")" constr0($c2) ] ->
[(PROD $c1 [$c]$c2)].
Check (x:nat)nat.
\end{coq_example}
%$
\noindent We could have checked it explicitly with a {\tt case} in
the right-hand side of the rule, but the error message in the
following example would not be as relevant:
% Test failure
\begin{coq_eval}
(********** The following is not correct and should produce **********)
(******* Error: This expression should be a simple identifier ********)
(* Just to adjust the prompt: *) Set Printing Depth 50.
\end{coq_eval}
\begin{coq_example}
Check (S O:nat)nat.
\end{coq_example}
\noindent This trick is not similar to the {\tt SQUASH} node in which
we could not detect the error while parsing. Here, the error pops out
when trying to build an abstraction of {\tt\$c2} over the value of
{\tt\$c}. Since it is not bound to a variable, the right-hand side of
the product grammar rule fails.
\section{Writing your own pretty printing rules}
\label{Syntax}
\comindex{Syntax}
There is a mechanism for extending the
vernacular's printer by adding, in the interactive
toplevel, new printing rules. The printing rules are stored into a
table and will be recovered at the moment of the printing by the
vernacular's printer.
The user can print new constants, tactics and vernacular phrases
with his desired syntax. The printing rules
for new constants should be written {\em after} the definition of the
constants. The rules should be
outside a section if the user wants them to be exported.
The printing rules corresponding to the heart of the system (primitive
tactics, terms and the vernacular language) are defined,
respectively, in the files {\tt PPTactic.v} and {\tt PPConstr.v}
(in the directory {\tt syntax}). These files are automatically
loaded in the initial state. The user is not expected to modify these
files unless he dislikes the way primitive things are printed, in
which case he will have to compile the system after doing the
modifications.
When the system fails to find a suitable printing rule, a tag
\verb+#GENTERM+\index{GENTERM@{\tt\#GENTERM}} appears in the message.
In the following we give some examples showing how to write the
printing rules for the non-terminal and terminal symbols of a
grammar. We will test them frequently by inspecting the error
messages. Then, we give the grammar of printing rules and a
description of its semantics.
\subsection{The Printing Rules}
\subsubsection{The printing of non terminals}
The printing is the inverse process of parsing. While a grammar rule
maps an input stream of characters
into an AST, a printing
rule maps an AST into an output stream of printing orders.
So given a certain grammar rule, the printing rule
is generally obtained by inverting the grammar rule.
Like grammar rules, it is possible to define several rules at the same
time. The exact syntax for complex rules is described
in~\ref{syntaxsyntax}. A simple printing rule is of the form:
\begin{center}
\verb+Syntax+ ~{\sl universe}
~\verb+level+ ~{\sl precedence}~\verb+:+~{\sl name}
~\verb+[+~{\sl pattern}~\verb+] -> [+~{\sl printing-orders}~\verb+].+
\end{center}
where :
\begin{itemize}
\item {\it universe} is an identifier denoting the universe of the AST to
be printed. They have the same meaning as grammar universes.
The vernac universe has no equivalent in pretty-printing since
vernac phrases are never printed by the system. Error messages are
reported by re-displaying what the user typed in.
\item {\it precedence} is positive integer indicating the precedence
of the rule. In general the precedence for tactics is 0. The
universe of terms is implicitly stratified by the hierarchy of
the parsing rules. We have non terminals \textit{constr0},
\textit{constr1}, \ldots, \textit{constr10}.
The idea is that objects parsed with the non terminal
$constr_i$ have precedence $i$. In most of the cases we fix the
precedence of the printing rules for commands to be the same number
of the non terminal with which it is parsed.
A precedence may also be a triple of integers. The triples are
ordered in lexicographic order, and the level $n$ is equal to {\tt
[$n~0~0$]}.
\item {\it name} is the name of the
printing rule. A rule is identified by both its universe and name,
if there are two rules with both the same name and universe, then
the last one overrides the former.
\item {\it pattern} is a pattern that matches the AST to be
printed.
The syntax of patterns is dependent on the universe of the AST to be
printed (e.g. patterns are parsed as {\tt constr} if the universe is
{\tt constr}, etc), and a quotation can be used to escape the default
parser associated to this universe.
A description of the syntax of patterns is given in section
\ref{patternsyntax}.
\item {\it printing-orders} is the sequence of orders indicating the
concrete layout of the printer.
\end{itemize}
\firstexample
\example{Syntax for user-defined tactics.}
The first usage of the \texttt{Syntax} command might be the printing
order for a user-defined tactic:
\begin{coq_example*}
Declare ML Module "eqdecide".
Syntax tactic level 0:
Compare_PP [(Compare $com1 $com2)] ->
["Compare" [1 2] $com1 [1 2] $com2].
\end{coq_example*}
% meme pas vrai
If such a printing rule is not given, a disgraceful \verb+#GENTERM+
will appear when typing \texttt{Show Script} or \texttt{Save}. For
a tactic macro defined by a \texttt{Tactic Definition} command, a
printing rule is automatically generated so the user don't have to
write one.
\example{Defining the syntax for new constants.}
Let's define the constant \verb+Xor+ in Coq:
\begin{coq_eval}
Reset Initial.
\end{coq_eval}
\begin{coq_example*}
Definition Xor := [A,B:Prop] A/\~B \/ ~A/\B.
\end{coq_example*}
Given this definition, we want to use the syntax of \verb+A X B+
to denote \verb+(Xor A B)+. To do that we give the grammar rule:
\begin{coq_example}
Grammar constr constr7 :=
Xor [ constr6($c1) "X" constr7($c2) ] -> [(Xor $c1 $c2)].
\end{coq_example}
Note that the operator is associative to the right.
Now \verb+True X False+ is well parsed:
\begin{coq_example}
Goal True X False.
\end{coq_example}
To have it well printed we extend the printer:
\begin{coq_example}
Syntax constr level 7:
Pxor [(Xor $t1 $t2)] -> [ $t1:L " X " $t2:E ].
\end{coq_example}
and now we have the desired syntax:
\begin{coq_example}
Show.
\end{coq_example}
Let's comment the rule:
\begin{itemize}
\item \verb+constr+ is the universe of the printing rule.
\item \verb+7+ is the rule's precedence and it is the same one than the
parsing production (constr7).
\item \verb+Pxor+ is the name of the printing rule.
\item \verb+(Xor $t1 $t2)+ is the pattern of the term to be
printed. Between \verb+<< >>+ we are allowed to use the syntax of
arbitrary AST instead of terms. Metavariables may occur in the pattern
but preceded by \verb+$+.
\item \verb+$t1:L " X " $t2:E+ are the printing
orders, it tells to print the value of \verb+$t1+ then the symbol
\verb+X+ and then the value of \verb+$t2+.
The \verb+L+ in the little box \verb+$t1:L+ indicates not
to put parentheses around the value of \verb+$t1+ if its precedence
is {\bf less} than the rule's one. An \verb+E+ instead of the
\verb+L+ would mean not to put parentheses around the value of
\verb+$t1+ if its the precedence is {\bf less or equal} than the
rule's one.
The associativity of the operator can be expressed in the following
way:
\verb&$t1:L " X " $t2:E& associates the operator to the right.
\verb&$t1:E " X " $t2:L& associates to the left.
\verb&$t1:L " X " $t2:L& is non-associative.
\end{itemize}
Note that while grammar rules are related by the name of non-terminals
(such as {\tt constr6} and {\tt constr7}) printing rules are
isolated. The {\tt Pxor} rule tells how to print an {\tt Xor}
expression but not how to print its subterms. The printer looks up
recursively the rules for the values of \verb+$t1+ and \verb+$t2+. The
selection of the printing rules is strictly determined by the
structure of the AST to be printed.
This could have been defined with the {\tt Infix} command.
\example{Forcing to parenthesize a new syntactic construction}
You can force to parenthesize a new syntactic construction by fixing
the precedence of its printing rule to a number greater than 9. For
example a possible printing rule for the {\tt Xor} connector in the prefix
notation would be:
\begin{coq_example*}
Syntax constr level 10:
ex_imp [(Xor $t1 $t2)] -> [ "X " $t1:L " " $t2:L ].
\end{coq_example*}
No explicit parentheses are contained in the rule, nevertheless, when
using the connector, the parentheses are automatically written:
\begin{coq_example}
Show.
\end{coq_example}
A precedence higher than 9 ensures that the AST value will be
parenthesized by default in either the empty context or if it occurs
in a context where the instructions are of the form
\verb+$t:L+ or \verb+$t:E+.
\example{Dealing with list patterns in the syntax rules}
The following productions extend the parser to recognize a
tactic called \verb+MyIntros+ that receives a list of identifiers as
argument as the primitive \verb+Intros+ tactic does:
\begin{coq_example*}
Grammar tactic simple_tactic: ast :=
my_intros [ "MyIntros" ne_identarg_list($idl) ] ->
[(MyIntrosWith ($LIST $idl))].
\end{coq_example*}
To define the printing rule for \verb+MyIntros+ it is necessary to
define the printing rule for the non terminal \verb+ne_identarg_list+.
In grammar productions the dependency between the non terminals is
explicit. This is not the case for printing rules, where the
dependency between the rules is determined by the structure of the
pattern. So, the way to make explicit the relation between printing
rules is by adding structure to the patterns.
\begin{coq_example}
Syntax tactic level 0:
myintroswith [<<(MyIntrosWith ($LIST $L))>>] ->
[ "MyIntros " (NEIDENTARGLIST ($LIST $L)) ].
\end{coq_example}
This rule says to print the string \verb+MyIntros+ and then to print
the value of \\
\verb+(NEIDENTARGLIST ($LIST $L))+.
\begin{coq_example}
Syntax tactic level 0:
ne_identarg_list_cons [<<(NEIDENTARGLIST $id ($LIST $l))>>]
-> [ $id " " (NEIDENTARGLIST ($LIST $l)) ]
| ne_identarg_list_single [<<(NEIDENTARGLIST $id)>>] -> [ $id ].
\end{coq_example}
The first rule says how to print a non-empty list, while the second
one says how to print the list with exactly one element. Note that the
pattern structure of the binding in the first rule ensures its use in
a recursive way.
Like the order of grammar productions, the order of printing rules
\emph{does matter}. In case of two rules whose patterns superpose
each other the {\bf last} rule is always chosen. In the example, if
the last two rules were written in the inverse order the printing will
not work, because only the rule {\sl ne\_identarg\_list\_cons} would
be recursively retrieved and there is no rule for the empty list.
Other possibilities would have been to write a rule for the empty list
instead of the {\sl ne\_identarg\_list\_single} rule,
\begin{coq_example}
Syntax tactic level 0:
ne_identarg_list_nil [<<(NEIDENTARGLIST)>>] -> [ ].
\end{coq_example}
This rule indicates to do nothing in case of the empty list. In this
case there is no superposition between patterns (no critical pairs)
and the order is not relevant. But a useless space would be printed
after the last identifier.
\example{Defining constants with arbitrary number of arguments}
Sometimes the constants we define may have an arbitrary number of
arguments, the typical case are polymorphic functions. Let's consider
for example the functional composition operator. The following rule
extends the parser:
\begin{coq_example*}
Definition explicit_comp := [A,B,C:Set][f:A->B][g:B->C][a:A](g (f a)).
Grammar constr constr6 :=
expl_comp [constr5($c1) "o" constr6($c2) ] ->
[(explicit_comp ? ? ? $c1 $c2)].
\end{coq_example*}
Our first idea is to write the printing rule just by ``inverting'' the
production:
\begin{coq_example}
Syntax constr level 6:
expl_comp [(explicit_comp ? ? ? $f $g)] -> [ $f:L "o" $g:L ].
\end{coq_example}
This rule is not correct: \verb+?+ is an ordinary AST (indeed, it is
the AST \verb|(XTRA "ISEVAR")|), and does not behave as the
``wildcard'' pattern \verb|$_|. Here is a correct version of this
rule:
\begin{coq_example*}
Syntax constr level 6:
expl_comp [(explicit_comp $_ $_ $_ $f $g)] -> [ $f:L "o" $g:L ].
\end{coq_example*}
Let's test the printing rule:
% 2nd test should fail
\begin{coq_example}
Definition Id := [A:Set][x:A]x.
Check (Id nat) o (Id nat).
Check ((Id nat)o(Id nat) O).
\end{coq_example}
In the first case the rule was used, while in the second one the
system failed to match the pattern of the rule with the AST of
\verb+((Id nat)o(Id nat) O)+.
Internally the AST of this term is the same as the AST of the
term \verb+(explicit_comp nat nat nat (Id nat) (Id nat) O)+.
When the system retrieves our rule it tries to match an application of
six arguments with an application of five arguments (the AST of
\verb+(explicit_comp $_ $_ $_ $f $g)+).
%$
Then, the matching fails and
the term is printed using the rule for application.
Note that the idea of adding a new rule for \verb+explicit_comp+ for
the case of six arguments does not solve the problem, because of the
polymorphism, we can always build a term with one argument more. The
rules for application deal with the problem of having an arbitrary
number of arguments by using list patterns. Let's see these rules:
\begin{coq_eval}
Definition foo := O.
\end{coq_eval}
\begin{coq_example*}
Syntax constr level 10:
app [<<(APPLIST $H ($LIST $T))>>] ->
[ [<hov 0> $H:E (APPTAIL ($LIST $T)):E ] ]
| apptailcons [<<(APPTAIL $H ($LIST $T))>>] ->
[ [1 1] $H:L (APPTAIL ($LIST $T)):E ]
| apptailnil [<<(APPTAIL)>>] -> [ ].
\end{coq_example*}
\begin{coq_eval}
Reset foo.
\end{coq_eval}
The first rule prints the operator of the application, and the second
prints the list of its arguments. Then, one solution to our problem is
to specialize the first rule of the application to the cases where the
operator is \verb+explicit_comp+ and the list pattern has {\bf at
least} five arguments:
\begin{coq_example*}
Syntax constr level 10:
expl_comp
[<<(APPLIST <<explicit_comp>> $_ $_ $_ $f $g ($LIST $l))>>]
-> [ [<hov 0> $f:L "o" $g:L (APPTAIL ($LIST $l)):E ] ].
\end{coq_example*}
%$
Now we can see that this rule works for any application of the
operator:
\begin{coq_example}
Check ((Id nat) o (Id nat) O).
Check ((Id nat->nat) o (Id nat->nat) [x:nat]x O).
\end{coq_example}
In the examples presented by now, the rules have no information about
how to deal with indentation, break points and spaces, the printer
will write everything in the same line without spaces. To indicate the
concrete layout of the patterns, there's a simple language of printing
instructions that will be described in the following section.
\subsubsection{The printing of terminals}
The user is not expected to write the printing rules for terminals,
this is done automatically. Primitive printing is done for
identifiers, strings, paths, numbers. For example :
\begin{coq_example*}
Grammar vernac vernac: ast :=
mycd [ "MyCd" prim:string($dir) "." ] -> [(MYCD $dir)].
Syntax vernac level 0:
mycd [<<(MYCD $dir)>>] -> [ "MyCd " $dir ].
\end{coq_example*}
There is no more need to encapsulate the \verb+$dir+ meta-variable
with the \verb+$PRIM+ or the \verb+$STR+ operator as in the version
6.1. However, the pattern \verb+(MYCD ($STR $dir))+ would be safer,
because the rule would not be selected to print an ill-formed AST. The
name of default primitive printer is the {\ocaml} function {\tt
print\_token}. If the user wants a particular terminal to be printed
by another printer, he may specify it in the right part of the
rule. Example:
% Pas encore possible! $num est un id, pas un entier.
%Syntax tactic level 0 :
% do_pp [<<Do $num $tactic>>]
% -> [ "Do " $num:"my_printer" [1 1] $tactic ].
\begin{coq_example*}
Syntax tactic level 0 :
do_pp [<<(DO ($NUM $num) $tactic)>>]
-> [ "Do " $num:"my_printer" [1 1] $tactic ].
\end{coq_example*}
The printer \textit{my\_printer} must have been installed as shown
below.
\subsubsection{Primitive printers}
Writing and installing primitive pretty-printers requires to have the
sources of the system like writing tactics.
A primitive pretty-printer is an \ocaml\ function of type
\begin{verbatim}
Esyntax.std_printer -> CoqAst.t -> Pp.std_ppcmds
\end{verbatim}
The first
argument is the global printer, it can be called for example by the
specific printer to print subterms. The second argument is the AST to
print, and the result is a stream of printing orders like :
\begin{itemize}
\item \verb+'sTR"+\textit{string}\verb+"+ to print the string
\textit{string}
\item \verb+'bRK +\textit{num1 num2} that has the same semantics than
\verb+[+ \textit{num1 num2} \verb+]+ in the print rules.
\item \verb+'sPC+ to leave a blank space
\item \verb+'iNT+ $n$ to print the integer $n$
\item \ldots
\end{itemize}
There is also commands to make boxes (\verb+h+ or \verb+hv+, described
in file {\tt lib/pp.mli}). Once the printer is written, it
must be registered by the command :
\begin{verbatim}
Esyntax.Ppprim.add ("name",my_printer);;
\end{verbatim}
\noindent
Then, in the toplevel, after having loaded the right {\ocaml} module,
it can be used in the right hand side of printing orders using the
syntax \verb+$truc:"name"+.
The real name and the registered name of a pretty-printer does not
need to be the same. However, it can be nice and simple to give the
same name.
\subsection{Syntax for pretty printing rules}
\label{syntaxsyntax}
This section describes the syntax for printing rules. The
metalanguage conventions are the same as those specified for the
definition of the {\sl pattern}'s syntax in section \ref{patternsyntax}.
The grammar of printing rules is the following:
\begin{center}
\begin{tabular}{|rcl|} \hline
{\sl printing-rule} & ::= &
\verb+Syntax+~{\ident}~~\nelist{{\sl level}}{;} \\
{\sl level} & ::= & \verb+level+~{\sl precedence}~\verb+:+
~\nelist{{\sl rule}}{|} \\
{\sl precedence} & ::= &
{\integer} ~$|$~ \verb+[+~\integer~\integer~\integer~\verb+]+ \\
{\sl rule} & ::= &
{\sl ident}~\verb+[+~{\sl pattern}~\verb+] -> [+%
~\sequence{{\sl printing-order}}{}~\verb+]+ \\
{\sl printing-order} & ::= & \verb+FNL+ \\
&$|$& {\sl string} \\
&$|$& \verb+[+~\integer~\integer~\verb+]+ \\
&$|$& \verb+[+~box~\sequence{{\sl printing-order}}{}~\verb+]+ \\
&$|$& {\sl ast}~\zeroone{{\tt :}~{\sl prim-printer}}~%
\zeroone{{\tt :}~{\sl paren-rel}}\\
{\sl box} & ::= & \verb+<+~{\sl box-type}~\integer~\verb+>+\\
{\sl box-type} & ::= & ~\verb+hov+~$|$~\verb+hv+~$|$~\verb+v+~$|$~\verb+h+\\
{\sl paren-rel} & ::= & \verb+L+~$|$~\verb+E+ \\
{\sl prim-printer} & ::= & {\sl string} \\
{\sl pattern} & ::= & {\sl ast-quot} ~$|$~\verb+<<+ {\sl ast} \verb+>>+ \\
\hline
\end{tabular}
\end{center}
Non-terminal {\sl ast-quot} is the default quotation associated to the
extended universe. Patterns not belonging to the input syntax can be
given directly as AST using \verb+<< >>+.
As already stated, the order of rules in a given level is relevant
(the last ones override the previous ones).
\subsubsection{Pretty grammar structures}
The basic structure is the printing order sequence. Each order has a
printing effect and they are sequentially executed. The orders can
be:
\begin{itemize}
\item printing orders
\item printing boxes
\end{itemize}
\paragraph{Printing orders}
Printing orders can be of the form:
\begin{itemize}
\item \verb+"+{\sl string}\verb+"+ prints the {\sl string}.
\item \verb+FNL+ force a new line.
\item \texttt{\$t:}\textsl{paren-rel} or
\texttt{\$t:}\textsl{prim-printer}\texttt{:}\textsl{paren-rel}
{\sl ast} is used to build an AST in current context. The printer
looks up the adequate printing rule and applies recursively this
method. The optional field {\it prim-printer} is a string with the
name primitive pretty-printer to call (The name is not the name of
the {\ocaml} function, but the name given to {\tt
Esyntax.Ppprim.add}). Recursion of the printing is determined by
the pattern's structure. {\it paren-rel} is the following:
\begin{tabular}{cl}
\verb+L+ &
if $t$ 's precedence is {\bf less} than the rule's one, then no
parentheses \\
& around $t$ are written. \\
\verb+E+ &
if $t$ 's precedence is {\bf less or equal} than the rule's one
then no parentheses \\
& around $t$ are written. \\
{\it none} & {\bf never} write parentheses around $t$.
\\\\
\end{tabular}
\end{itemize}
\paragraph{Printing boxes}
The concept of formatting boxes is used to describe the concrete
layout of patterns: a box may contain many objects which are orders or
sub\-boxes sequences separated by break\-points; the box wraps around
them an imaginary rectangle.
\begin{enumerate}
\item {\bf Box types}
The type of boxes specifies the way the components of the box will
be displayed and may be:
\begin{itemize}
\item \verb+h+ : to catenate objects horizontally.
\item \verb+v+ : to catenate objects vertically.
\item \verb+hv+ : to catenate objects as with an ``h box'' but an
automatic vertical folding is applied when the horizontal
composition does not fit into the width of the associated output
device.
\item \verb+hov+ : to catenate objects horizontally but if the
horizontal composition does not fit, a vertical composition will be
applied, trying to catenate horizontally as many objects as possible.
\end{itemize}
The type of the box can be followed by a {\it n} offset value, which
is the offset added to the current indentation when breaking lines
inside the box.
\item {\bf Boxes syntax}
A box is described by a sequence surrounded by \verb+[ ]+. The first
element of the sequence is the box type: this type surrounded by the
symbols \verb+< >+ is one of the words \verb+hov+, \verb+hv+,
\verb+v+, \verb+v+ followed by an offset. The default offset is 0
and the default box type is \verb+h+.
\item {\bf Break\-points}
In order to specify where the pretty-printer is allowed to break,
one of the following break-points may be used:
\begin{itemize}
\item \verb+[0 0]+ is a simple break-point, if the line is not broken
here, no space is included (``Cut'').
\item \verb+[1 0]+ if the line is not broken then a space is printed
(``Spc'').
\item \verb+[i j]+ if the line is broken, the value $j$ is added to the
current indentation for the following line; otherwise $i$ blank spaces
are inserted (``Brk'').
\end{itemize}
\noindent {\bf Examples :}
It is interesting to test printing rules on ``small'' and ``large''
expressions in order to see how the break of lines and indentation are
managed. Let's define two constants and make a \verb+Print+ of them to
test the rules. Here are some examples of rules for our constant
\verb+Xor+:
\begin{coq_example*}
Definition A := True X True.
Definition B := True X True X True X True X True X True X True
X True X True X True X True X True X True.
\end{coq_example*}
\begin{coq_example*}
Syntax constr level 6:
Pxor [(Xor $t1 $t2)] -> [ $t1:L " X " $t2:E ].
\end{coq_example*}
This rule prints everything in the same line exceeding the line's
width.
% Avec coq-tex, le bord de l'e'cran n'est pas mate'rialise'.
%\begin{coq_example}
%Print B.
%\end{coq_example}
\begin{small}
\begin{flushleft}
\verb!Coq < Print B.!\\
\texttt{\textit{B~=~}}\\
\texttt{\textit{True~X~True~X~True~X~True~X~True~X~True~X~True~X~True~X~True~X~True~X~Tru}}\\
\texttt{\textit{e~X~True~X~True}}\\
\texttt{\textit{~~~~~:~Prop}}\\
\end{flushleft}
\end{small}
Let's add some break-points in order to force the printer to break the
line before the operator:
\begin{coq_example*}
Syntax constr level 6:
Pxor [(Xor $t1 $t2)] -> [ $t1:L [0 1] " X " $t2:E ].
\end{coq_example*}
\begin{coq_example}
Print B.
\end{coq_example}
The line was correctly broken but there is no indentation at all. To
deal with indentation we use a printing box:
\begin{coq_example*}
Syntax constr level 6:
Pxor [(Xor $t1 $t2)] ->
[ [<hov 0> $t1:L [0 1] " X " $t2:E ] ].
\end{coq_example*}
With this rule the printing of A is correct, an the printing of B is
indented.
\begin{coq_example}
Print B.
\end{coq_example}
If we had chosen the mode \verb+v+ instead of \verb+hov+ :
\begin{coq_example*}
Syntax constr level 6:
Pxor [(Xor $t1 $t2)] -> [ [<v 0> $t1:L [0 1] " X " $t2:E ] ].
\end{coq_example*}
We would have obtained a vertical presentation:
\begin{coq_example}
Print A.
\end{coq_example}
The difference between the presentation obtained with the \verb+hv+
and \verb+hov+ type box is not evident at first glance. Just for
clarification purposes let's compare the result of this silly rule
using an \verb+hv+ and a \verb+hov+ box type:
\begin{coq_example*}
Syntax constr level 6:
Pxor [(Xor $t1 $t2)] ->
[ [<hv 0> "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
[0 0] "----------------------------------------"
[0 0] "ZZZZZZZZZZZZZZZZ" ] ].
\end{coq_example*}
\begin{coq_example}
Print A.
\end{coq_example}
\begin{coq_example*}
Syntax constr level 6:
Pxor [(Xor $t1 $t2)] ->
[ [<hov 0> "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
[0 0] "----------------------------------------"
[0 0] "ZZZZZZZZZZZZZZZZ" ] ].
\end{coq_example*}
\begin{coq_example}
Print A.
\end{coq_example}
In the first case, as the three strings to be printed do not fit in
the line's width, a vertical presentation is applied. In the second
case, a vertical presentation is applied, but as the last two strings
fit in the line's width, they are printed in the same line.
\end{enumerate}
\subsection{Debugging the printing rules}
By now, there is almost no semantic check of printing rules in the
system. To find out where the problem is, there are two
possibilities: to analyze the rules looking for the most common
errors or to work in the toplevel tracing the ml code of the
printer.
When the system can't find the proper rule to print an Ast, it prints
\verb+#GENTERM+ \textit{ast}. If you added no printing rule,
it's probably a bug and you can send it to the \Coq\ team.
\subsubsection{Most common errors}
Here are some considerations that may help to get rid of simple
errors:
\begin{itemize}
\item make sure that the rule you want to use is not defined in
previously closed section.
\item make sure that {\bf all} non-terminals of your grammar have
their corresponding printing rules.
\item make sure that the set of printing rules for a certain non
terminal covers all the space of AST values for that non terminal.
\item the order of the rules is important. If there are two rules
whose patterns superpose (they have common instances) then it is
always the most recent rule that will be retrieved.
\item if there are two rules with the same name and universe the last
one overrides the first one. The system always warns you about
redefinition of rules.
\end{itemize}
\subsubsection{Tracing the {\ocaml} code of the printer}
Some of the conditions presented above are not easy to verify when
dealing with many rules. In that case tracing the code helps to
understand what is happening. The printers are in the file {\tt
src/typing/printer}. There you will find the functions:
\begin{itemize}
\item {\tt prterm} : the printer of constructions
\item {\tt gentacpr} : the printer of tactics
\end{itemize}
These printers are defined in terms of a general printer {\tt
genprint} (this function is located in {\tt src/parsing/esyntax.ml})
and by instantiating it with the adequate parameters. {\tt genprint}
waits for: the universe to which this AST belongs ({\it tactic}, {\it
constr}), a default printer, the precedence of the AST inherited from
the caller rule and the AST to print. {\tt genprint} looks for a rule
whose pattern matches the AST, and executes in order the printing
orders associated to this rule. Subterms are printed by recursively
calling the generic printer. If no rule matches the AST, the default
printer is used.
An AST of a universe may have subterms that belong to another
universe. For instance, let $v$ be the AST of the tactic
expression \verb+MyTactic O+. The function {\tt gentacpr} is called
to print $v$. This function instantiates the general printer {\tt
genprint} with the universe {\it tactic}. Note that $v$ has a subterm
$c$ corresponding to the AST of \verb+O+ ($c$ belongs to the universe
{\it constr}). {\tt genprint} will try recursively to print all
subterms of $v$ as belonging to the same universe of $v$. If this is
not possible, because the subterm belongs to another universe, then
the default printer that was given as argument to {\tt genprint} is
applied. The default printer is responsible for changing the universe
in a proper way calling the suitable printer for $c$.
\medskip\noindent {\bf Technical Remark.} In the file
\verb+PPTactic.v+, there are some rules that do not arise from the
inversion of a parsing rule. They are strongly related to the way the
printing is implemented.
\begin{coq_example*}
Syntax tactic level 8:
tactic_to_constr [<<(COMMAND $c)>>] -> [ $c:"constr":9 ]
\end{coq_example*}
As an AST of tactic may have subterms that are commands, these rules
allow the printer of tactic to change the universe. The primitive printer
{\tt command} is a special identifier used for this
purpose. They are used in the code of the default printer that {\tt
gentacpr} gives to {\tt genprint}.
\fi
% $Id$
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "Reference-Manual"
%%% End:
|