The CompCert verified compiler

Commented Coq development

Version 2.1, 2013-10-28

Introduction

CompCert is a compiler that generates PowerPC, ARM and x86 assembly code from CompCert C, a large subset of the C programming language. The particularity of this compiler is that it is written mostly within the specification language of the Coq proof assistant, and its correctness --- the fact that the generated assembly code is semantically equivalent to its source program --- was entirely proved within the Coq proof assistant.

High-level descriptions of the CompCert compiler and its proof of correctness can be found in the following papers (in increasing order of technical details):

Xavier Leroy, Formal verification of a realistic compiler. Communications of the ACM 52(7), July 2009.
Sandrine Blazy, Zaynah Dargaye and Xavier Leroy, Formal verification of a C compiler front-end. Proceedings of Formal Methods 2006, LNCS 4085.
Xavier Leroy, A formally verified compiler back-end. Journal of Automated Reasoning 43(4):363-446, 2009.

This Web site gives a commented listing of the underlying Coq specifications and proofs. Proof scripts are folded by default, but can be viewed by clicking on "Proof". Some modules (written in italics below) differ between the three supported target architectures. The PowerPC versions of these modules are shown below; the ARM and x86 versions can be found in the source distribution.

This development is a work in progress; some parts have substantially changed since the overview papers above were written.

The complete sources for CompCert can be downloaded from the CompCert Web site.

This document and the CompCert sources are copyright 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013 Institut National de Recherche en Informatique et en Automatique (INRIA) and distributed under the terms of the following license.

General-purpose libraries, data structures and algorithms

Coqlib: addendum to the Coq standard library.
Maps: finite maps.
Integers: machine integers.
Floats: machine floating-point numbers.
Iteration: various forms of "while" loops.
Ordered: construction of ordered types.
Lattice: construction of semi-lattices.
Kildall: resolution of dataflow inequations by fixpoint iteration.
UnionFind: a persistent union-find data structure.
Postorder: postorder numbering of a directed graph.

Definitions and theorems used in many parts of the development

Errors: the Error monad.
AST: identifiers, whole programs and other common elements of abstract syntaxes.
Values: run-time values.
Events: observable events and traces.
Memtype: memory model (interface).
See also: Memory (implementation of the memory model).
See also: Memdata (in-memory representation of data).
Globalenvs: global execution environments.
Smallstep: tools for small-step semantics.
Behaviors: from small-step semantics to observable behaviors of programs.
Determinism: determinism properties of small-step semantics.
Op: operators, addressing modes and their semantics.
Subtyping: a solver for atomic subtyping constraints.

Source, intermediate and target languages: syntax and semantics

The CompCert C source language: syntax and semantics and determinized semantics.
See also: type expressions and operators (syntax and semantics) and reference interpreter.
Clight: a simpler version of CompCert C where expressions contain no side-effects.
Csharpminor: low-level structured language.
Cminor: low-level structured language, with explicit stack allocation of certain local variables.
CminorSel: like Cminor, with machine-specific operators and addressing modes.
RTL: register transfer language (3-address code, control-flow graph, infinitely many pseudo-registers).
See also: Registers (representation of pseudo-registers).
LTL: location transfer language (3-address code, control-flow graph of basic blocks, finitely many physical registers, infinitely many stack slots).
See also: Locations (representation of locations) and Machregs (description of processor registers).
Linear: like LTL, but the CFG is replaced by a linear list of instructions with explicit branches and labels.
Mach: like Linear, with a more concrete view of the activation record.
Asm: abstract syntax for PowerPC assembly code.

Compiler passes

Pass	Source & target	Compiler code	Correctness proof
Pulling side-effects out of expressions; fixing an evaluation order	CompCert C to Clight	SimplExpr	SimplExprspec SimplExprproof
Pulling non-adressable scalar local variables out of memory	Clight to Clight	SimplLocals	SimplLocalsproof
Simplification of control structures; explication of type-dependent computations	Clight to Csharpminor	Cshmgen	Cshmgenproof
Stack allocation of local variables whose address is taken; simplification of switch statements	Csharpminor to Cminor	Cminorgen	Cminorgenproof
Recognition of operators and addressing modes	Cminor to CminorSel	Selection SelectOp SelectLong	Selectionproof SelectOpproof SelectLongproof
Construction of the CFG, 3-address code generation	CminorSel to RTL	RTLgen	RTLgenspec RTLgenproof
Recognition of tail calls	RTL to RTL	Tailcall	Tailcallproof
Function inlining	RTL to RTL	Inlining	Inliningspec Inliningproof
Postorder renumbering of the CFG	RTL to RTL	Renumber	Renumberproof
Constant propagation	RTL to RTL	Constprop ConstpropOp Liveness	Constpropproof ConstproppOproof
Common subexpression elimination	RTL to RTL	CSE CombineOp	CSEproof CombineOpproof
Register allocation (validation a posteriori)	RTL to LTL	Allocation	Allocproof
Branch tunneling	LTL to LTL	Tunneling	Tunnelingproof
Linearization of the CFG	LTL to Linear	Linearize	Linearizeproof
Removal of unreferenced labels	Linear to Linear	CleanupLabels	CleanupLabelsproof
Laying out the activation records	Linear to Mach	Stacking Bounds Stacklayout	Stackingproof
Emission of assembly code	Mach to Asm	Asmgen	Asmgenproof0 Asmgenproof1 Asmgenproof

Type systems

Trivial type systems are used to statically capture well-formedness conditions on some intermediate languages.

RTLtyping: typing for RTL + type reconstruction.
Lineartyping: typing for Linear.

All together

Compiler: composing the passes together; whole-compiler semantic preservation theorems.
Complements: interesting consequences of the semantic preservation theorems.

Xavier.Leroy@inria.fr