first draft of a "folk wisdom" document

author: Andres Erbsen <andreser@mit.edu> 2016-06-24 13:39:12 -0400
committer: Andres Erbsen <andreser@mit.edu> 2016-06-24 13:39:12 -0400
commit: 34674c61fab8d1c7d8fd8652baafa8c62a54e61d (patch)
tree: 2383984ca9e96103ce44ed4cf4ed1215773e608b /folkwisdom.md
parent: eafd5f50fc25dd93020869449e9a57858b4a3ceb (diff)
1 files changed, 263 insertions, 0 deletions
diff --git a/folkwisdom.md b/folkwisdom.md
new file mode 100644
index 000000000..e1c52351c
--- /dev/null
+++ b/folkwisdom.md
@@ -0,0 +1,263 @@
+The purpose of this document is to document which design heuristics have
+resulted in tolerable code, and which patterns might be indicative of a dead
+end. The intent is to document experience (imperical results), not comprehensive
+truth. For each guideline here, there probably exists some situation where it is
+best to deviate from the said guideline. Yet incomplete wisdom is better than no
+wisdom, so let's note down the trade-offs we have faced and how we have resolved
+them.
+
+# Data representations
+
+## Validity
+
+### Obvious validity
+
+For some data, the obvious simple type can only take meaningful values --
+a natural number is either zero or the successor of another natural number, and
+that is all there is to it. This section covers the cases when the choice of
+correct representation is not obvious.
+
+### Validity by a well-engineered simple type structure
+
+If using a representation where the type only admits semantically valid values
+would not make the functions that operate that type look horrendous, use that
+representation. For example, the Coq standard library represents a rational
+number as a pair of a integer numerator and `positive` denominator, and it works
+out okay. Fractions with 0 denominator can not be represented because there is
+no value of type `positive` that would be interpreted as zero.
+
+On the other hand, an elliptic curve point in Edwards coordinates is informally
+defined as a pair `(x, y)` s.t. `x^2 + y^2 = 1 + d*x^2*y^2`. It is no longer
+obvious how one would hack up a simple type that only represents valid points.
+A solution that gets quite close would be to use "point compression": store `y`
+and the sign bit of `x`; solve for `x` each time it is needed. Yet this
+representation is not unique (there are two ways to represent a point with
+`x=0`), and solving for `x` would clutter all functions that operate on `x` and
+`y`. Uniqueness can still be restored at the cost of cluttering the functions
+even more: one could make a type that has three cases `Xpositive`, that takes
+a `y`, `Xnegative` that takes a `y`, and `Xzero` that represents the point `(0,
+1)` but does not allow a superfluous y to be specified.
+
+The alternative is to first write down a type that can represent all valid
+values and some invalid ones, and then disallow the invalid ones. Sometimes
+disallowing the invalid values can be circumvented by redefining validity: it
+may be possible to extend an algorithm that only operates on positive numbers to
+work with all natural numbers or even all integers. Sometimes the validity
+condition is crucial: the group law for combining two Edwards curve points is
+only proven to be associative for points that are on the curve, not for all
+pairs of numbers.
+
+### Validity using dependent types
+
+If nontrivial data validity invariants are required, the intuitive "value such
+that" definition can be encoded as a sigma type: `Definition point := { P | let
+'(x,y) := P in a*x^2 + y^2 = 1 + d*x^2*y^2 }`. In this encoding, every time
+a value of type `point` is constructed out of the `x` and `y` coordinates,
+a proof that the point is valid will need to be provided. Code that manipulates
+point using functions that are already known to return points does not need to
+deal with the invariant: a value that comes out of a function that returns
+`point` is valid because wherever it was that the value was constructed from
+coordinates, a validity proof must have been provided. Using sigma types this
+way moves proof obligations from one place in the code to another, hopefully
+more convenient place.
+
+Dependent types are not a free lunch, we have encountered three types of issues
+when dealing with them.
+
+#### Rewriting
+
+The `rewrite` tactic does not work in equations that construct a point from
+its coordinates and a proof. This is because after rewriting just the
+coordinates, the proof would still refer to the old coordinates, and thus the
+construction would not be valid. It might be possible to make rewriting work by
+showing an equivalence relation on values such that the invariant does not
+become invalidated when replacing one equivalent value with another, and the
+rewrite only changes the value within an equivalence class. It might even be
+possible to hook this argument with the `setoid_rewrite` tactic to automate it.
+However, a more lightweight rule has been successful enough to keep us from
+pursuing that strategy: when non-trivial equational reasoning on the value is
+required, split this up the invariant preservation proof. For example, when an
+optimized `add : point -> point -> point`, first define `add_coordinates : F*F
+-> F*F -> F*F` that operates on arbitrary pairs of coordinates, do all the
+rewriting you want, and then define `add` in terms of `add_coordinates`. In
+analogy to Java's `public` and `private` annotations, users of the Edwards curve
+`point`s never operate on the coordinates alone, while the implementation
+of point addition operates on coordinates and proves invariant preservation
+separately.
+
+#### Computation inside Coq
+
+Computing values of a proof-carrying type inside Coq often runs unreasonably
+long or uses unreasonably much memory. However, computing the coordinates of the
+same point that itself would take too long to compute seems to work fine. In
+cases were the evaluation heuristics do not manage to make this okay, rewriting
+the expression before evaluating it has not been too difficult.
+
+#### Equality and Equivalence
+
+Equality and invariant preservation proofs do not play nice by default. Under
+Leibniz equality, showing that two points with equal coordinates are equal
+requires showing that the invariant preservation proofs are equal. This can be
+done using the `UIP_dec` lemma if the invariant itself is an equality and that
+for all values of coordinates it is decidable to compute whether the invariant
+holds on them or or not. These conditions tend to be met for the invariants this
+project uses. Otherwise, two points can be defined to be equivalent (not equal)
+if they have the same coordinates (working with custom equivalence relations
+involves nontrivial considerations of its own).
+
+#### Note on defining values
+
+The most reliable way to make a value of a sigma type is to start the
+`Definition` in proof mode (ending the first line with a dot like `Definition
+zero : point.`) and then do `refine (exist _ value _); abstract
+(tacticThatProvesInvariant)`. Some of the time, `Program Definition` will do the
+same thing without explicit proof mode (as the Edwards curve spec) but it tends
+to over-eagerly rewrite `if` and `match` statements (as in the Weierstrass curve
+spec). Neatness of the value that is defined takes priority over neatness of the
+code that performs the definition.
+
+## Equivalence
+
+Terms can be equivalent in different ways. In this list, each kind of equality
+is stricter than the one right after it.
+
+- syntactic: they are literally the same expression. Proof automation is
+sensitive to syntactic changes.
+- definitional: one term can be transformed into the other by applying rules of
+computation only. Proof checking considers two syntactically different
+expressions interchangeable as long as they are definitionally equal.
+- propositional: as established by a proof (which can use lemmas in addition to
+rules of computation), these two terms would compute to the exact same vale. It
+is always possible to rewrite using a propositional equality
+- custom equivalence relations: for example, two fractions `a/b` and `c/d` can
+be defined equivalent iff `a*d = b*c`. Given `x == y`, it is possible to rewrite
+`f(g(x))` to `f(g(y))` if there is some equivalence relation `===` such that the
+output of `f` does not change when one `===`-equivalent thing is replaced with
+another, and the output of `g` only changes within a `===`-equivalence class
+when the input changes within a `==`-equivalence class. This is what `Proper`
+instances are about; the last statement can be written `Proper((==)=>(===))g`.
+
+### Which equivalence to use?
+
+When defining a datatype from scratch, try to define it in a way that makes
+propositional equality coincide with the desired notion of equivalence. However,
+as with sigma types, it is not worth garbling the code itself to use the
+propositional equality
+
+When defining a datatype that has a field of a type that is used with a custom
+equivalence, it is probably a good idea to define the obvious custom equivalence
+relation for the new type right away (and make and `Equivalence` instance for
+the relation and `Proper` instances for constructors and projections. When
+defining a type parametrized over some other type, and it is likely that some
+use of the parametric type will be set the parameter to a type with a custom
+equivalence relation, define an equivalence relation for the new type,
+parametrized over an equivalence relation for the parameter type.
+
+When writing a module that is parametric over a type, require the weakest
+sufficient equivalence. In particular, if there is a chance that the module will
+be instantiated with a type whose propositional equivalence makes little sense,
+it is well worth requiring additional parameters for `Proper` and `Equivalence`
+(and maybe `Decidable`) instances. See the Algebra library for an example.
+
+When writing a tactic, specify on the form of the input goal format with
+syntactic precision -- trying to be insensitive to syntactic changes and only
+require definitional equality can lead to arbitrary amounts of computation and
+is usually not worth it. One possible exception is the
+reification-by-typeclasses pattern which has not yet had issues even when it
+works up to definitional equivalence to the extent typeclass resolution does.
+
+### Custom Equivalence Relations and Tactics
+
+The primary reason for avoiding custom equivalence relations is that the tactic
+support for them is incomplete and sometimes brittle. This does not mean that
+custom equivalence relations are useless: battling with a buggy tactic can be
+bad; reformulating a proof to bypass using custom equivalence relations is often
+worse. Here are some suggestions.
+
+1. The number one reason for hitting issues with custom equivalence relations is
+that either the goal or the givens contain a typo that switched one equivalence
+relation with another. In particular, `<>` refers to negation of propositional
+quality by default. This can be avoided with good conventions: when introducing
+a type that will be used with a custom equivalence relation, introduce the
+equivalence relation (and its `Equivalence` instance) right away. When
+introducing a function involving a such type, introduce a `Proper` instance
+right away. Define local notations for the equivalences used in the file. Note
+that while introducing the tooling may clutters the source code, section
+parameters introduced using `Context {x:type}` are required for the definitions
+in the section only if the definition actually refers to them.
+2. Use normal `rewrite` by default -- it can deal with custom equivalence
+relations and `Proper` instances.
+3. To rewrite under a binder (e.g. `E` in `Let_In x (fun x => E)` or `E` in `map
+(fun x => E) xs`) use `setoid_rewrite`. However, `setoid_rewrite` tries to be
+oblivious to definitional equality -- and in doing so, it sometimes goes on
+a wild goose chase trying to unfold definitions to find something that can be
+rewritten deep down in the structure. This can cause arbitrary slowdowns and
+unexpected behavior. To dodge the issues, mark the definitions as `Local Opaque`
+before the `setoid_rewrite` and re-set to `Local Transparent` after.
+4. When a `rewrite` fails (probably with an inexplicable error message),
+double-check that the conventions from (1) are followed correctly. If this does
+not reveal the issue, try `setoid_rewrite` instead for troubleshooting (it
+sometimes even gives sensible error messages in Coq 8.5) but revert to `rewrite`
+once the issue has been found and fixed. If rewriting picks the wrong
+`Equivalence` or `Proper` instance for some type or function (or fails to find
+one altogether), troubleshoot that separately: check that `pose (_:@Equivalence
+T equiv)` or `pose(_:Proper (equiv1==>equiv2) f)` give the right answer, and if
+not, `Set Typeclasses Debug` and read the log (which is in the hidden `*coq*`
+buffer in `proofgeneral`). A useful heuristic for reading that log is to look
+for the first place where the resolution backtracks and then read backwards from
+there.
+5. To perform multiple rewrites at once, make rewrite hint database and call
+`idtac; (rewrite_strat topdown hints dbname).` where `topdown` can be replaced
+with `bottomup` for different behavior. This does not seem to unfold
+definitions, so it may be a reasonable alternative to using `Local Opaque` to
+protect definitions from `setoid_rewrite`.
+
+# Generality and Abstraction
+
+Coq has ample facilities for abstraction and generality. The main issue we have
+faced is that undisciplined use of them can clutter definitions even when only
+the properties of the definitions are to be abstracted. We are not sure that the
+strategy we are currently working with is optimal (in particular, it results in
+verbose code), but it definitely dodges some issues other approaches do not. In
+particular, we want to relate a specific set of types and operations to an
+abstract structure (e.g., a group) without changing the definitions of any of
+the type or operations involved. This reduces the interlock between concrete
+definitions and abstractions and also enables relating the same definitions to
+an abstract structure in multiple ways (e.g., integers can be a semiring under
+`+` and `*` or under `max` and `+`). Furthermore, we require that it be possible
+to manipulate abstractions using tactics.
+
+To define an abstraction, fist open a module with no parameters to make a new
+namespace for the abstraction (even if the entire file is dedicated to one
+abstraction). Then open a new section and add to the context parameters for all
+definitions. The requirements which need to be satisfied by the definitions to
+fit the structure can be defined using a typeclass record. When a one of the
+requirements is a typeclass, mark the field name as a `Global Existing Instance`
+right after declaring the new record. To prove lemmas about everything of that
+structure, introduce an instance of that typeclass to the context (in additions
+to the parameters). To relate a specific set of definitions to an abstract
+structure, create a `Global Instance` of the parametric typeclass for that
+structure, with the specific definitions filled in. To use a lemma proven about
+an abstract structure in a specific context, just apply or rewrite using it --
+typeclass resolution will most likely determine that the specific definitions
+indeed fit the structure. If not, specify the parameters to lemma manually (a
+definitive description of what parameters are required and in what form can be
+seen using `About lemma_name.`).
+
+Compared to other means of abstraction, this one does not create common notation
+for all instances of the same structure. We define notations for each specific
+definition, and separate them using notation scopes (which are often named after
+datatypes). It would be possible to add a typeclass field that aliases the
+specific definition and define a global notation for it, but definitions that
+use that notation would then gain an extra layer of definitions: each underlying
+operator would be wrapped in the abstract one for which the notation was
+defined. Furthermore, this interferes with having one definition satisfy the
+same structure in multiple ways.
+
+Another alternative is to use the Coq module system (which can be understood by
+reading the documentation of the Ocaml module system documentation), and when
+applicable, this is usually a good idea. When parameters are set once and used
+very many times, the module system can be quite convenient. However, if the
+number of parameters and the number of places where the values of them are
+determined are comparable to the number of uses of abstractly proven lemmas, the
+module system can be cumbersome. Modules cannot be instantiated automatically.
author	Andres Erbsen <andreser@mit.edu>	2016-06-24 13:39:12 -0400
committer	Andres Erbsen <andreser@mit.edu>	2016-06-24 13:39:12 -0400
commit	34674c61fab8d1c7d8fd8652baafa8c62a54e61d (patch)
tree	2383984ca9e96103ce44ed4cf4ed1215773e608b /folkwisdom.md
parent	eafd5f50fc25dd93020869449e9a57858b4a3ceb (diff)