\documentclass{article}
\usepackage{fullpage,amsmath,amssymb,proof,url}
\usepackage[T1]{fontenc}
\usepackage{ae,aecompl}
\newcommand{\cd}[1]{\texttt{#1}}
\newcommand{\mt}[1]{\mathsf{#1}}
\newcommand{\rc}{+ \hspace{-.075in} + \;}
\newcommand{\rcut}{\; \texttt{-{}-} \;}
\newcommand{\rcutM}{\; \texttt{-{}-{}-} \;}
\usepackage{hyperref}
\begin{document}
\title{The Ur/Web Manual}
\author{Adam Chlipala}
\maketitle
\tableofcontents
\section{Introduction}
\emph{Ur} is a programming language designed to introduce richer type system features into functional programming in the tradition of ML and Haskell. Ur is functional, pure, statically typed, and strict. Ur supports a powerful kind of \emph{metaprogramming} based on \emph{type-level computation with type-level records}.
\emph{Ur/Web} is Ur plus a special standard library and associated rules for parsing and optimization. Ur/Web supports construction of dynamic web applications backed by SQL databases. The signature of the standard library is such that well-typed Ur/Web programs ``don't go wrong'' in a very broad sense. Not only do they not crash during particular page generations, but they also may not:
\begin{itemize}
\item Suffer from any kinds of code-injection attacks
\item Return invalid HTML
\item Contain dead intra-application links
\item Have mismatches between HTML forms and the fields expected by their handlers
\item Include client-side code that makes incorrect assumptions about the ``AJAX''-style services that the remote web server provides
\item Attempt invalid SQL queries
\item Use improper marshaling or unmarshaling in communication with SQL databases or between browsers and web servers
\end{itemize}
This type safety is just the foundation of the Ur/Web methodology. It is also possible to use metaprogramming to build significant application pieces by analysis of type structure. For instance, the demo includes an ML-style functor for building an admin interface for an arbitrary SQL table. The type system guarantees that the admin interface sub-application that comes out will always be free of the above-listed bugs, no matter which well-typed table description is given as input.
The Ur/Web compiler also produces very efficient object code that does not use garbage collection. These compiled programs will often be even more efficient than what most programmers would bother to write in C. The compiler also generates JavaScript versions of client-side code, with no need to write those parts of applications in a different language.
\medskip
The official web site for Ur is:
\begin{center}
\url{http://www.impredicative.com/ur/}
\end{center}
\section{Installation}
If you are lucky, then the following standard command sequence will suffice for installation, in a directory to which you have unpacked the latest distribution tarball.
\begin{verbatim}
./configure
make
sudo make install
\end{verbatim}
Some other packages must be installed for the above to work. At a minimum, you need a standard UNIX shell, with standard UNIX tools like sed and GCC (or an alternate C compiler) in your execution path; MLton, the whole-program optimizing compiler for Standard ML; and the development files for the OpenSSL C library. As of this writing, in the ``testing'' version of Debian Linux, this command will install the more uncommon of these dependencies:
\begin{verbatim}
apt-get install mlton libssl-dev
\end{verbatim}
Note that, like the Ur/Web compiler, MLton is a whole-program optimizing compiler, so it frequently requires much more memory than old-fashioned compilers do. Expect building Ur/Web with MLton to require not much less than a gigabyte of RAM. If a \texttt{mlton} invocation ends suspiciously, the most likely explanation is that it has exhausted available memory.
To build programs that access SQL databases, you also need one of these client libraries for supported backends.
\begin{verbatim}
apt-get install libpq-dev libmysqlclient-dev libsqlite3-dev
\end{verbatim}
It is also possible to access the modules of the Ur/Web compiler interactively, within Standard ML of New Jersey. To install the prerequisites in Debian testing:
\begin{verbatim}
apt-get install smlnj libsmlnj-smlnj ml-yacc ml-lpt
\end{verbatim}
To begin an interactive session with the Ur compiler modules, run \texttt{make smlnj}, and then, from within an \texttt{sml} session, run \texttt{CM.make "src/urweb.cm";}. The \texttt{Compiler} module is the main entry point, and you can find its signature in \texttt{src/compiler.sig}.
To run an SQL-backed application with a backend besides SQLite, you will probably want to install one of these servers.
\begin{verbatim}
apt-get install postgresql mysql-server
\end{verbatim}
To use the Emacs mode, you must have a modern Emacs installed. We assume that you already know how to do this, if you're in the business of looking for an Emacs mode. The demo generation facility of the compiler will also call out to Emacs to syntax-highlight code, and that process depends on the \texttt{htmlize} module, which can be installed in Debian testing via:
\begin{verbatim}
apt-get install emacs-goodies-el
\end{verbatim}
If you don't want to install the Emacs mode, run \texttt{./configure} with the argument \texttt{--without-emacs}.
Even with the right packages installed, configuration and building might fail to work. After you run \texttt{./configure}, you will see the values of some named environment variables printed. You may need to adjust these values to get proper installation for your system. To change a value, store your preferred alternative in the corresponding UNIX environment variable, before running \texttt{./configure}. For instance, here is how to change the list of extra arguments that the Ur/Web compiler will pass to the C compiler and linker on every invocation. Some older GCC versions need this setting to mask a bug in function inlining.
\begin{verbatim}
CCARGS=-fno-inline ./configure
\end{verbatim}
Since the author is still getting a handle on the GNU Autotools that provide the build system, you may need to do some further work to get started, especially in environments with significant differences from Linux (where most testing is done). The variables \texttt{PGHEADER}, \texttt{MSHEADER}, and \texttt{SQHEADER} may be used to set the proper C header files to include for the development libraries of PostgreSQL, MySQL, and SQLite, respectively. To get libpq to link, one OS X user reported setting \texttt{CCARGS="-I/opt/local/include -L/opt/local/lib/postgresql84"}, after creating a symbolic link with \texttt{ln -s /opt/local/include/postgresql84 /opt/local/include/postgresql}.
The Emacs mode can be set to autoload by adding the following to your \texttt{.emacs} file.
\begin{verbatim}
(add-to-list 'load-path "/usr/local/share/emacs/site-lisp/urweb-mode")
(load "urweb-mode-startup")
\end{verbatim}
Change the path in the first line if you chose a different Emacs installation path during configuration.
\section{Command-Line Compiler}
\subsection{\label{cl}Project Files}
The basic inputs to the \texttt{urweb} compiler are project files, which have the extension \texttt{.urp}. Here is a sample \texttt{.urp} file.
\begin{verbatim}
database dbname=test
sql crud1.sql
crud
crud1
\end{verbatim}
The \texttt{database} line gives the database information string to pass to libpq. In this case, the string only says to connect to a local database named \texttt{test}.
The \texttt{sql} line asks for an SQL source file to be generated, giving the commands to run to create the tables and sequences that this application expects to find. After building this \texttt{.urp} file, the following commands could be used to initialize the database, assuming that the current UNIX user exists as a Postgres user with database creation privileges:
\begin{verbatim}
createdb test
psql -f crud1.sql test
\end{verbatim}
A blank line separates the named directives from a list of modules to include in the project. Any line may contain a shell-script-style comment, where any suffix of a line starting at a hash character \texttt{\#} is ignored.
For each entry \texttt{M} in the module list, the file \texttt{M.urs} is included in the project if it exists, and the file \texttt{M.ur} must exist and is always included.
Here is the complete list of directive forms. ``FFI'' stands for ``foreign function interface,'' Ur's facility for interaction between Ur programs and C and JavaScript libraries.
\begin{itemize}
\item \texttt{[allow|deny] [url|mime|requestHeader|responseHeader|env|meta] PATTERN} registers a rule governing which URLs, MIME types, HTTP request headers, HTTP response headers, environment variable names, or HTML \texttt{ } names are allowed to appear explicitly in this application. The first such rule to match a name determines the verdict. If \texttt{PATTERN} ends in \texttt{*}, it is interpreted as a prefix rule. Otherwise, a string must match it exactly.
\item \texttt{alwaysInline PATH} requests that every call to the referenced function be inlined. Section \ref{structure} explains how functions are assigned path strings.
\item \texttt{benignEffectful Module.ident} registers an FFI function or transaction as having side effects. The optimizer avoids removing, moving, or duplicating calls to such functions. Every effectful FFI function must be registered, or the optimizer may make invalid transformations. This version of the \texttt{effectful} directive registers that this function only has side effects that remain local to a single page generation.
\item \texttt{clientOnly Module.ident} registers an FFI function or transaction that may only be run in client browsers.
\item \texttt{clientToServer Module.ident} adds FFI type \texttt{Module.ident} to the list of types that are OK to marshal from clients to servers. Values like XML trees and SQL queries are hard to marshal without introducing expensive validity checks, so it's easier to ensure that the server never trusts clients to send such values. The file \texttt{include/urweb/urweb\_cpp.h} shows examples of the C support functions that are required of any type that may be marshalled. These include \texttt{attrify}, \texttt{urlify}, and \texttt{unurlify} functions.
\item \texttt{coreInline TREESIZE} sets how many nodes the AST of a function definition may have before the optimizer stops trying hard to inline calls to that function. (This is one of two options for one of two intermediate languages within the compiler.)
\item \texttt{database DBSTRING} sets the string to pass to libpq to open a database connection.
\item \texttt{debug} saves some intermediate C files, which is mostly useful to help in debugging the compiler itself.
\item \texttt{effectful Module.ident} registers an FFI function or transaction as having side effects. The optimizer avoids removing, moving, or duplicating calls to such functions. This is the default behavior for \texttt{transaction}-based types.
\item \texttt{exe FILENAME} sets the filename to which to write the output executable. The default for file \texttt{P.urp} is \texttt{P.exe}.
\item \texttt{file URI FILENAME} asks for the application executable to respond to requests for \texttt{URI} by serving a snapshot of the contents of \texttt{FILENAME} as of compile time. That is, the file contents are baked into the executable. System file \texttt{/etc/mime.types} is consulted (again, at compile time) to figure out the right MIME type to suggest in the HTTP response.
\item \texttt{ffi FILENAME} reads the file \texttt{FILENAME.urs} to determine the interface to a new FFI module. The name of the module is calculated from \texttt{FILENAME} in the same way as for normal source files. See the files \texttt{include/urweb/urweb\_cpp.h} and \texttt{src/c/urweb.c} for examples of C headers and implementations for FFI modules. In general, every type or value \texttt{Module.ident} becomes \texttt{uw\_Module\_ident} in C.
\item \texttt{html5} asks to generate HTML5 code, which primarily affects the first few lines of the output documents, like the \texttt{DOCTYPE}. This option is on by default.
\item \texttt{include FILENAME} adds \texttt{FILENAME} to the list of files to be \texttt{\#include}d in C sources. This is most useful for interfacing with new FFI modules.
\item \texttt{jsFile FILENAME} asks to serve the contents of a file as JavaScript. All such content is concatenated into a single file, included via a \texttt{}.
\subsubsection{Node IDs}
There is an abstract type of node IDs that may be assigned to \cd{id} attributes of most HTML tags.
$$\begin{array}{l}
\mt{type} \; \mt{id} \\
\mt{val} \; \mt{fresh} : \mt{transaction} \; \mt{id}
\end{array}$$
The \cd{fresh} function is allowed on both server and client, but there is no other way to create IDs, which includes lack of a way to force an ID to match a particular string. The main semantic importance of IDs within Ur/Web is in uses of the HTML \cd{} tag. IDs play a much more central role in mainstream JavaScript programming, but Ur/Web uses a very different model to enable changes to particular nodes of a page tree, as the next manual subsection explains. IDs may still be useful in interfacing with JavaScript code (for instance, through Ur/Web's FFI).
One further use of IDs is as handles for requesting that \emph{focus} be given to specific tags.
$$\begin{array}{l}
\mt{val} \; \mt{giveFocus} : \mt{id} \to \mt{transaction} \; \mt{unit}
\end{array}$$
\subsubsection{\label{signals}Functional-Reactive Page Generation}
Most approaches to ``AJAX''-style coding involve imperative manipulation of the DOM tree representing an HTML document's structure. Ur/Web follows the \emph{functional-reactive} approach instead. Programs may allocate mutable \emph{sources} of arbitrary types, and an HTML page is effectively a pure function over the latest values of the sources. The page is not mutated directly, but rather it changes automatically as the sources are mutated.
More operationally, you can think of a source as a mutable cell with facilities for subscription to change notifications. That level of detail is hidden behind a monadic facility to be described below. First, there are three primitive operations for working with sources just as if they were ML \cd{ref} cells, corresponding to ML's \cd{ref}, \cd{:=}, and \cd{!} operations.
$$\begin{array}{l}
\mt{con} \; \mt{source} :: \mt{Type} \to \mt{Type} \\
\mt{val} \; \mt{source} : \mt{t} ::: \mt{Type} \to \mt{t} \to \mt{transaction} \; (\mt{source} \; \mt{t}) \\
\mt{val} \; \mt{set} : \mt{t} ::: \mt{Type} \to \mt{source} \; \mt{t} \to \mt{t} \to \mt{transaction} \; \mt{unit} \\
\mt{val} \; \mt{get} : \mt{t} ::: \mt{Type} \to \mt{source} \; \mt{t} \to \mt{transaction} \; \mt{t}
\end{array}$$
Only source creation and setting are supported server-side, as a convenience to help in setting up a page, where you may wish to allocate many sources that will be referenced through the page. All server-side storage of values inside sources uses string serializations of values, while client-side storage uses normal JavaScript values.
Pure functions over arbitrary numbers of sources are represented in a monad of \emph{signals}, which may only be used in client-side code. This is presented to the programmer in the form of a monad $\mt{signal}$, each of whose values represents (conceptually) some pure function over all sources that may be allocated in the course of program execution. A monad operation $\mt{signal}$ denotes the identity function over a particular source. By using $\mt{signal}$ on a source, you implicitly subscribe to change notifications for that source. That is, your signal will automatically be recomputed as that source changes. The usual monad operators make it possible to build up complex signals that depend on multiple sources; automatic updating upon source-value changes still happens automatically. There is also an operator for computing a signal's current value within a transaction.
$$\begin{array}{l}
\mt{con} \; \mt{signal} :: \mt{Type} \to \mt{Type} \\
\mt{val} \; \mt{signal\_monad} : \mt{monad} \; \mt{signal} \\
\mt{val} \; \mt{signal} : \mt{t} ::: \mt{Type} \to \mt{source} \; \mt{t} \to \mt{signal} \; \mt{t} \\
\mt{val} \; \mt{current} : \mt{t} ::: \mt{Type} \to \mt{signal} \; \mt{t} \to \mt{transaction} \; \mt{t}
\end{array}$$
A reactive portion of an HTML page is injected with a $\mt{dyn}$ tag, which has a signal-valued attribute $\mt{Signal}$.
$$\begin{array}{l}
\mt{val} \; \mt{dyn} : \mt{ctx} ::: \{\mt{Unit}\} \to \mt{use} ::: \{\mt{Type}\} \to \mt{bind} ::: \{\mt{Type}\} \to [\mt{ctx} \sim [\mt{Dyn}]] \Rightarrow \mt{unit} \\
\hspace{.1in} \to \mt{tag} \; [\mt{Signal} = \mt{signal} \; (\mt{xml} \; ([\mt{Dyn}] \rc \mt{ctx}) \; \mt{use} \; \mt{bind})] \; ([\mt{Dyn}] \rc \mt{ctx}) \; [] \; \mt{use} \; \mt{bind}
\end{array}$$
The semantics of \cd{} tags is somewhat subtle. When the signal associated with such a tag changes value, the associated subtree of the HTML page is recreated. Some properties of the subtree, such as attributes and client-side widget values, are specified explicitly in the signal value, so these may be counted on to remain the same after recreation. Other properties, like focus and cursor position within textboxes, are \emph{not} specified by signal values, and these properties will be \emph{reset} upon subtree regeneration. Furthermore, user interaction with widgets may not work properly during regeneration. For instance, clicking a button while it is being regenerated may not trigger its \cd{onclick} event code.
Currently, the only way to avoid undesired resets is to avoid regeneration of containing subtrees. There are two main strategies for achieving that goal. First, when changes to a subtree can be confined to CSS classes of tags, the \texttt{dynClass} pseudo-attribute may be used instead (see Section \ref{xml}), as it does not regenerate subtrees. Second, a single \cd{} tag may be broken into multiple tags, in a way that makes finer-grained dependency structure explicit. This latter strategy can avoid ``spurious'' regenerations that are not actually required to achieve the intended semantics.
Transactions can be run on the client by including them in attributes like the $\mt{Onclick}$ attribute of $\mt{button}$, and GUI widgets like $\mt{ctextbox}$ have $\mt{Source}$ attributes that can be used to connect them to sources, so that their values can be read by code running because of, e.g., an $\mt{Onclick}$ event. It is also possible to create an ``active'' HTML fragment that runs a $\mt{transaction}$ to determine its content, possibly allocating some sources in the process:
$$\begin{array}{l}
\mt{val} \; \mt{active} : \mt{unit} \to \mt{tag} \; [\mt{Code} = \mt{transaction} \; \mt{xbody}] \; \mt{body} \; [] \; [] \; []
\end{array}$$
\subsubsection{Remote Procedure Calls}
Any function call may be made a client-to-server ``remote procedure call'' if the function being called needs no features that are only available to client code. To make a function call an RPC, pass that function call as the argument to $\mt{Basis.rpc}$:
$$\begin{array}{l}
\mt{val} \; \mt{rpc} : \mt{t} ::: \mt{Type} \to \mt{transaction} \; \mt{t} \to \mt{transaction} \; \mt{t}
\end{array}$$
There is an alternate form that uses $\mt{None}$ to indicate that an error occurred during RPC processing, rather than raising an exception to abort this branch of control flow.
$$\begin{array}{l}
\mt{val} \; \mt{tryRpc} : \mt{t} ::: \mt{Type} \to \mt{transaction} \; \mt{t} \to \mt{transaction} \; (\mt{option} \; \mt{t})
\end{array}$$
\subsubsection{Asynchronous Message-Passing}
To support asynchronous, ``server push'' delivery of messages to clients, any client that might need to receive an asynchronous message is assigned a unique ID. These IDs may be retrieved both on the client and on the server, during execution of code related to a client.
$$\begin{array}{l}
\mt{type} \; \mt{client} \\
\mt{val} \; \mt{self} : \mt{transaction} \; \mt{client}
\end{array}$$
\emph{Channels} are the means of message-passing. Each channel is created in the context of a client and belongs to that client; no other client may receive the channel's messages. Note that here \emph{client} has a technical Ur/Web meaning so that it describes only \emph{single page views}, so a user following a traditional link within an application will remove the ability for \emph{any} code to receive messages on the channels associated with the previous client. Each channel type includes the type of values that may be sent over the channel. Sending and receiving are asynchronous, in the sense that a client need not be ready to receive a message right away. Rather, sent messages may queue up, waiting to be processed.
$$\begin{array}{l}
\mt{con} \; \mt{channel} :: \mt{Type} \to \mt{Type} \\
\mt{val} \; \mt{channel} : \mt{t} ::: \mt{Type} \to \mt{transaction} \; (\mt{channel} \; \mt{t}) \\
\mt{val} \; \mt{send} : \mt{t} ::: \mt{Type} \to \mt{channel} \; \mt{t} \to \mt{t} \to \mt{transaction} \; \mt{unit} \\
\mt{val} \; \mt{recv} : \mt{t} ::: \mt{Type} \to \mt{channel} \; \mt{t} \to \mt{transaction} \; \mt{t}
\end{array}$$
The $\mt{channel}$ and $\mt{send}$ operations may only be executed on the server, and $\mt{recv}$ may only be executed on a client. Neither clients nor channels may be passed as arguments from clients to server-side functions, so persistent channels can only be maintained by storing them in the database and looking them up using the current client ID or some application-specific value as a key.
Clients and channels live only as long as the web browser page views that they are associated with. When a user surfs away, his client and its channels will be garbage-collected, after that user is not heard from for the timeout period. Garbage collection deletes any database row that contains a client or channel directly. Any reference to one of these types inside an $\mt{option}$ is set to $\mt{None}$ instead. Both kinds of handling have the flavor of weak pointers, and that is a useful way to think about clients and channels in the database.
\emph{Note}: Currently, there are known concurrency issues with multi-threaded applications that employ message-passing on top of database engines that don't support true serializable transactions. Postgres (versions 9.1 and up) is the only supported engine that does this properly.
\section{Ur/Web Syntax Extensions}
Ur/Web features some syntactic shorthands for building values using the functions from the last section. This section sketches the grammar of those extensions. We write spans of syntax inside brackets to indicate that they are optional.
\subsection{SQL}
\subsubsection{\label{tables}Table Declarations}
$\mt{table}$ declarations may include constraints, via these grammar rules.
$$\begin{array}{rrcll}
\textrm{Declarations} & d &::=& \mt{table} \; x : c \; [pk[,]] \; cts \mid \mt{view} \; x = V \\
\textrm{Primary key constraints} & pk &::=& \mt{PRIMARY} \; \mt{KEY} \; K \\
\textrm{Keys} & K &::=& f \mid (f, (f,)^+) \mid \{\{e\}\} \\
\textrm{Constraint sets} & cts &::=& \mt{CONSTRAINT} f \; ct \mid cts, cts \mid \{\{e\}\} \\
\textrm{Constraints} & ct &::=& \mt{UNIQUE} \; K \mid \mt{CHECK} \; E \\
&&& \mid \mt{FOREIGN} \; \mt{KEY} \; K \; \mt{REFERENCES} \; F \; (K) \; [\mt{ON} \; \mt{DELETE} \; pr] \; [\mt{ON} \; \mt{UPDATE} \; pr] \\
\textrm{Foreign tables} & F &::=& x \mid \{\{e\}\} \\
\textrm{Propagation modes} & pr &::=& \mt{NO} \; \mt{ACTION} \mid \mt{RESTRICT} \mid \mt{CASCADE} \mid \mt{SET} \; \mt{NULL} \\
\textrm{View expressions} & V &::=& Q \mid \{e\}
\end{array}$$
A signature item $\mt{table} \; \mt{x} : \mt{c}$ is actually elaborated into two signature items: $\mt{con} \; \mt{x\_hidden\_constraints} :: \{\{\mt{Unit}\}\}$ and $\mt{val} \; \mt{x} : \mt{sql\_table} \; \mt{c} \; \mt{x\_hidden\_constraints}$. This is appropriate for common cases where client code doesn't care which keys a table has. It's also possible to include constraints after a $\mt{table}$ signature item, with the same syntax as for $\mt{table}$ declarations. This may look like dependent typing, but it's just a convenience. The constraints are type-checked to determine a constructor $u$ to include in $\mt{val} \; \mt{x} : \mt{sql\_table} \; \mt{c} \; (u \rc \mt{x\_hidden\_constraints})$, and then the expressions are thrown away. Nonetheless, it can be useful for documentation purposes to include table constraint details in signatures. Note that the automatic generation of $\mt{x\_hidden\_constraints}$ leads to a kind of free subtyping with respect to which constraints are defined.
\subsubsection{Queries}
Queries $Q$ are added to the rules for expressions $e$.
$$\begin{array}{rrcll}
\textrm{Queries} & Q &::=& (q \; [\mt{ORDER} \; \mt{BY} \; O] \; [\mt{LIMIT} \; N] \; [\mt{OFFSET} \; N]) \\
\textrm{Pre-queries} & q &::=& \mt{SELECT} \; [\mt{DISTINCT}] \; P \; \mt{FROM} \; F,^+ \; [\mt{WHERE} \; E] \; [\mt{GROUP} \; \mt{BY} \; p,^+] \; [\mt{HAVING} \; E] \\
&&& \mid q \; R \; q \mid \{\{\{e\}\}\} \\
\textrm{Relational operators} & R &::=& \mt{UNION} \mid \mt{INTERSECT} \mid \mt{EXCEPT} \\
\textrm{$\mt{ORDER \; BY}$ items} & O &::=& \mt{RANDOM} [()] \mid \hat{E} \; [o] \mid \hat{E} \; [o], O \mid \{\{\{e\}\}\}
\end{array}$$
$$\begin{array}{rrcll}
\textrm{Projections} & P &::=& \ast & \textrm{all columns} \\
&&& p,^+ & \textrm{particular columns} \\
\textrm{Pre-projections} & p &::=& t.f & \textrm{one column from a table} \\
&&& t.\{\{c\}\} & \textrm{a record of columns from a table (of kind $\{\mt{Type}\}$)} \\
&&& t.* & \textrm{all columns from a table} \\
&&& \hat{E} \; [\mt{AS} \; f] & \textrm{expression column} \\
\textrm{Table names} & t &::=& x & \textrm{constant table name (automatically capitalized)} \\
&&& X & \textrm{constant table name} \\
&&& \{\{c\}\} & \textrm{computed table name (of kind $\mt{Name}$)} \\
\textrm{Column names} & f &::=& X & \textrm{constant column name} \\
&&& \{c\} & \textrm{computed column name (of kind $\mt{Name}$)} \\
\textrm{Tables} & T &::=& x & \textrm{table variable, named locally by its own capitalization} \\
&&& x \; \mt{AS} \; X & \textrm{table variable, with local name} \\
&&& x \; \mt{AS} \; \{c\} & \textrm{table variable, with computed local name} \\
&&& \{\{e\}\} \; \mt{AS} \; X & \textrm{computed table expression, with local name} \\
&&& \{\{e\}\} \; \mt{AS} \; \{c\} & \textrm{computed table expression, with computed local name} \\
\textrm{$\mt{FROM}$ items} & F &::=& T \mid \{\{e\}\} \mid F \; J \; \mt{JOIN} \; F \; \mt{ON} \; E \\
&&& \mid F \; \mt{CROSS} \; \mt{JOIN} \ F \\
&&& \mid (Q) \; \mt{AS} \; X \mid (Q) \; \mt{AS} \; \{c\} \\
&&& \mid (\{\{e\}\}) \; \mt{AS} \; t \\
\textrm{Joins} & J &::=& [\mt{INNER}] \\
&&& \mid [\mt{LEFT} \mid \mt{RIGHT} \mid \mt{FULL}] \; [\mt{OUTER}] \\
\textrm{SQL expressions} & E &::=& t.f & \textrm{column references} \\
&&& X & \textrm{named expression references} \\
&&& \{[e]\} & \textrm{injected native Ur expressions} \\
&&& \{e\} & \textrm{computed expressions, probably using $\mt{sql\_exp}$ directly} \\
&&& \mt{TRUE} \mid \mt{FALSE} & \textrm{boolean constants} \\
&&& \ell & \textrm{primitive type literals} \\
&&& \mt{NULL} & \textrm{null value (injection of $\mt{None}$)} \\
&&& E \; \mt{IS} \; \mt{NULL} & \textrm{nullness test} \\
&&& \mt{COALESCE}(E, E) & \textrm{take first non-null value} \\
&&& n & \textrm{nullary operators} \\
&&& u \; E & \textrm{unary operators} \\
&&& E \; b \; E & \textrm{binary operators} \\
&&& \mt{COUNT}(\ast) & \textrm{count number of rows} \\
&&& a(E) & \textrm{other aggregate function} \\
&&& \mt{IF} \; E \; \mt{THEN} \; E \; \mt{ELSE} \; E & \textrm{conditional} \\
&&& (Q) & \textrm{subquery (must return a single expression column)} \\
&&& (E) & \textrm{explicit precedence} \\
\textrm{Nullary operators} & n &::=& \mt{CURRENT\_TIMESTAMP} \\
\textrm{Unary operators} & u &::=& \mt{NOT} \\
\textrm{Binary operators} & b &::=& \mt{AND} \mid \mt{OR} \mid = \mid \neq \mid < \mid \leq \mid > \mid \geq \mid \mt{LIKE} \\
\textrm{Aggregate functions} & a &::=& \mt{COUNT} \mid \mt{AVG} \mid \mt{SUM} \mid \mt{MIN} \mid \mt{MAX} \\
\textrm{Directions} & o &::=& \mt{ASC} \mid \mt{DESC} \mid \{e\} \\
\textrm{SQL integer} & N &::=& n \mid \{e\} \\
\textrm{Windowable expressions} & \hat{E} &::=& E \\
&&& w \; [\mt{OVER} \; ( & \textrm{(Postgres only)} \\
&&& \hspace{.1in} [\mt{PARTITION} \; \mt{BY} \; E] \\
&&& \hspace{.1in} [\mt{ORDER} \; \mt{BY} \; O])] \\
\textrm{Window function} & w &::=& \mt{RANK}() \\
&&& \mt{COUNT}(*) \\
&&& a(E)
\end{array}$$
Additionally, an SQL expression may be inserted into normal Ur code with the syntax $(\mt{SQL} \; E)$ or $(\mt{WHERE} \; E)$. Similar shorthands exist for other nonterminals, with the prefix $\mt{FROM}$ for $\mt{FROM}$ items and $\mt{SELECT1}$ for pre-queries.
Unnamed expression columns in $\mt{SELECT}$ clauses are assigned consecutive natural numbers, starting with 1. Any expression in a $p$ position that is enclosed in parentheses is treated as an expression column, rather than a column pulled directly out of a table, even if it is only a field projection. (This distinction affects the record type used to describe query results.)
\subsubsection{DML}
DML commands $D$ are added to the rules for expressions $e$.
$$\begin{array}{rrcll}
\textrm{Commands} & D &::=& (\mt{INSERT} \; \mt{INTO} \; T^E \; (f,^+) \; \mt{VALUES} \; (E,^+)) \\
&&& (\mt{UPDATE} \; T^E \; \mt{SET} \; (f = E,)^+ \; \mt{WHERE} \; E) \\
&&& (\mt{DELETE} \; \mt{FROM} \; T^E \; \mt{WHERE} \; E) \\
\textrm{Table expressions} & T^E &::=& x \mid \{\{e\}\}
\end{array}$$
Inside $\mt{UPDATE}$ and $\mt{DELETE}$ commands, lone variables $X$ are interpreted as references to columns of the implicit table $\mt{T}$, rather than to named expressions.
\subsection{XML}
XML fragments $L$ are added to the rules for expressions $e$.
$$\begin{array}{rrcll}
\textrm{XML fragments} & L &::=& \texttt{ } \mid \texttt{}l^*\texttt{ } \\
\textrm{XML pieces} & l &::=& \textrm{text} & \textrm{cdata} \\
&&& \texttt{<}g\texttt{/>} & \textrm{tag with no children} \\
&&& \texttt{<}g\texttt{>}l^*\texttt{}x\texttt{>} & \textrm{tag with children} \\
&&& \{e\} & \textrm{computed XML fragment} \\
&&& \{[e]\} & \textrm{injection of an Ur expression, via the $\mt{Top}.\mt{txt}$ function} \\
\textrm{Tag} & g &::=& h \; (x [= v])^* \\
\textrm{Tag head} & h &::=& x & \textrm{tag name} \\
&&& h\{c\} & \textrm{constructor parameter} \\
\textrm{Attribute value} & v &::=& \ell & \textrm{literal value} \\
&&& \{e\} & \textrm{computed value} \\
\end{array}$$
When the optional $= v$ is omitted in an XML attribute, the attribute is assigned value $\mt{True}$ in Ur/Web, and it is rendered to HTML merely as including the attribute name without a value. If such a Boolean attribute is manually set to value $\mt{False}$, then it is omitted altogether in generating HTML.
Further, there is a special convenience and compatibility form for setting CSS classes of tags. If a \cd{class} attribute has a value that is a string literal, the literal is parsed in the usual HTML way and replaced with calls to appropriate Ur/Web combinators. Any dashes in the text are replaced with underscores to determine Ur identifiers. The same desugaring can be accessed in a normal expression context by calling the pseudo-function \cd{CLASS} on a string literal.
Similar support is provided for \cd{style} attributes. Normal CSS syntax may be used in string literals that are \cd{style} attribute values, and the desugaring may be accessed elsewhere with the pseudo-function \cd{STYLE}.
\section{\label{structure}The Structure of Web Applications}
A web application is built from a series of modules, with one module, the last one appearing in the \texttt{.urp} file, designated as the main module. The signature of the main module determines the URL entry points to the application. Such an entry point should have type $\mt{t1} \to \ldots \to \mt{tn} \to \mt{transaction} \; \mt{page}$, for any integer $n \geq 0$, where $\mt{page}$ is a type synonym for top-level HTML pages, defined in $\mt{Basis}$. If such a function is at the top level of main module $M$, with $n = 0$, it will be accessible at URI \texttt{/M/f}, and so on for more deeply nested functions, as described in Section \ref{tag} below. See Section \ref{cl} for information on the \texttt{prefix} and \texttt{rewrite url} directives, which can be used to rewrite the default URIs of different entry point functions. The final URL of a function is its default module-based URI, with \texttt{rewrite url} rules applied, and with the \texttt{prefix} prepended. Arguments to an entry-point function are deserialized from the part of the URI following \texttt{f}.
Elements of modules beside the main module, including page handlers, will only be included in the final application if they are transitive dependencies of the handlers in the main module.
Normal links are accessible via HTTP \texttt{GET}, which the relevant standard says should never cause side effects. To export a page which may cause side effects, accessible only via HTTP \texttt{POST}, include one argument of the page handler of type $\mt{Basis.postBody}$. When the handler is called, this argument will receive a value that can be deconstructed into a MIME type (with $\mt{Basis.postType}$) and payload (with $\mt{Basis.postData}$). This kind of handler should not be used with forms that exist solely within Ur/Web apps; for these, use Ur/Web's built-in support, as described below. It may still be useful to use $\mt{Basis.postBody}$ with form requests submitted by code outside an Ur/Web app. For such cases, the function $\mt{Top.postFields} : \mt{postBody} \to \mt{list} \; (\mt{string} \times \mt{string})$ may be useful, breaking a \texttt{POST} body of type \texttt{application/x-www-form-urlencoded} into its name-value pairs.
Any normal page handler may also include arguments of type $\mt{option \; Basis.queryString}$, which will be handled specially. Rather than being deserialized from the current URI, such an argument is passed the whole query string that the handler received. The string may be analyzed by calling $\mt{Basis.show}$ on it. A handler of this kind may be passed as an argument to $\mt{Basis.effectfulUrl}$ to generate a URL to a page that may be used as a ``callback'' by an external service, such that the handler is allowed to cause side effects.
When the standalone web server receives a request for a known page, it calls the function for that page, ``running'' the resulting transaction to produce the page to return to the client. Pages link to other pages with the \texttt{link} attribute of the \texttt{a} HTML tag. A link has type $\mt{transaction} \; \mt{page}$, and the semantics of a link are that this transaction should be run to compute the result page, when the link is followed. Link targets are assigned URL names in the same way as top-level entry points.
HTML forms are handled in a similar way. The $\mt{action}$ attribute of a $\mt{submit}$ form tag takes a value of type $\$\mt{use} \to \mt{transaction} \; \mt{page}$, where $\mt{use}$ is a kind-$\{\mt{Type}\}$ record of the form fields used by this action handler. Action handlers are assigned URL patterns in the same way as above.
For both links and actions, direct arguments and local variables mentioned implicitly via closures are automatically included in serialized form in URLs, in the order in which they appear in the source code. Such serialized values may only be drawn from a limited set of types, and programs will fail to compile when the (implicit or explicit) arguments of page handler functions involve disallowed types. (Keep in mind that every free variable of a function is an implicit argument if it was not defined at the top level of a module.) For instance:
\begin{itemize}
\item Functions are disallowed, since there is no obvious way to serialize them safely.
\item XML fragments are disallowed, since it is unclear how to check client-provided XML to be sure it doesn't break the HTML invariants of the application (for instance, by mutating the DOM in the conventional way, interfering with Ur/Web's functional-reactive regime).
\item Blobs (``files'') are disallowed, since they can easily have very large serializations that could not fit within most web servers' URL size limits. (And you probably don't want to be serializing, e.g., image files in URLs, anyway.)
\end{itemize}
Ur/Web programs generally mix server- and client-side code in a fairly transparent way. The one important restriction is that mixed client-server code must encapsulate all server-side pieces within named functions. This is because execution of such pieces will be implemented by explicit calls to the remote web server, and it is useful to get the programmer's help in designing the interface to be used. For example, this makes it easier to allow a client running an old version of an application to continue interacting with a server that has been upgraded to a new version, if the programmer took care to keep the interfaces of all of the old remote calls the same. The functions implementing these services are assigned names in the same way as normal web entry points, by using module structure.
\medskip
The HTTP standard suggests that GET requests only be used in ways that generate no side effects. Side effecting operations should use POST requests instead. The Ur/Web compiler enforces this rule strictly, via a simple conservative program analysis. Any page that may have a side effect must be accessed through a form, all of which use POST requests, or via a direct call to a page handler with some argument of type $\mt{Basis.postBody}$. A page is judged to have a side effect if its code depends syntactically on any of the side-effecting, server-side FFI functions. Links, forms, and most client-side event handlers are not followed during this syntactic traversal, but \texttt{} handlers \emph{are} examined, since they run right away and could just as well be considered parts of main page handlers.
Ur/Web includes a kind of automatic protection against cross site request forgery attacks. Whenever any page execution can have side effects and can also read at least one cookie value, all cookie values must be signed cryptographically, to ensure that the user has come to the current page by submitting a form on a real page generated by the proper server. Signing and signature checking are inserted automatically by the compiler. This prevents attacks like phishing schemes where users are directed to counterfeit pages with forms that submit to your application, where a user's cookies might be submitted without his knowledge, causing some undesired side effect.
\subsection{Tasks}
In many web applications, it's useful to run code at points other than requests from browsers. Ur/Web's \emph{task} mechanism facilitates this. A type family of \emph{task kinds} is in the standard library:
$$\begin{array}{l}
\mt{con} \; \mt{task\_kind} :: \mt{Type} \to \mt{Type} \\
\mt{val} \; \mt{initialize} : \mt{task\_kind} \; \mt{unit} \\
\mt{val} \; \mt{clientLeaves} : \mt{task\_kind} \; \mt{client} \\
\mt{val} \; \mt{periodic} : \mt{int} \to \mt{task\_kind} \; \mt{unit}
\end{array}$$
A task kind names a particular extension point of generated applications, where the type parameter of a task kind describes which extra input data is available at that extension point. Add task code with the special declaration form $\mt{task} \; e_1 = e_2$, where $e_1$ is a task kind with data $\tau$, and $e_2$ is a function from $\tau$ to $\mt{transaction} \; \mt{unit}$.
The currently supported task kinds are:
\begin{itemize}
\item $\mt{initialize}$: Code that is run when the application starts up.
\item $\mt{clientLeaves}$: Code that is run for each client that the runtime system decides has surfed away. When a request that generates a new client handle is aborted, that handle will still eventually be passed to $\mt{clientLeaves}$ task code, even though the corresponding browser was never informed of the client handle's existence. In other words, in general, $\mt{clientLeaves}$ handlers will be called more times than there are actual clients.
\item $\mt{periodic} \; n$: Code that is run when the application starts up and then every $n$ seconds thereafter.
\end{itemize}
\section{\label{ffi}The Foreign Function Interface}
It is possible to call your own C and JavaScript code from Ur/Web applications, via the foreign function interface (FFI). The starting point for a new binding is a \texttt{.urs} signature file that presents your external library as a single Ur/Web module (with no nested modules). Compilation conventions map the types and values that you use into C and/or JavaScript types and values.
It is most convenient to encapsulate an FFI binding with a new \texttt{.urp} file, which applications can include with the \texttt{library} directive in their own \texttt{.urp} files. A number of directives are likely to show up in the library's project file.
\begin{itemize}
\item \texttt{clientOnly Module.ident} registers a value as being allowed only in client-side code.
\item \texttt{clientToServer Module.ident} declares a type as OK to marshal between clients and servers. By default, abstract FFI types are not allowed to be marshalled, since your library might be maintaining invariants that the simple serialization code doesn't check.
\item \texttt{effectful Module.ident} registers a function that can have side effects. This is the default for \texttt{transaction}-based types, and, actually, this directive is mostly present for legacy compatibility reasons, since it used to be required explicitly for each \texttt{transaction}al function.
\item \texttt{ffi FILE.urs} names the file giving your library's signature. You can include multiple such files in a single \texttt{.urp} file, and each file \texttt{mod.urp} defines an FFI module \texttt{Mod}.
\item \texttt{include FILE} requests inclusion of a C header file.
\item \texttt{jsFile FILE} requests inclusion of a JavaScript source file.
\item \texttt{jsFunc Module.ident=name} gives a mapping from an Ur name for a value to a JavaScript name.
\item \texttt{link FILE} requests that \texttt{FILE} be linked into applications. It should be a C object or library archive file, and you are responsible for generating it with your own build process.
\item \texttt{script URL} requests inclusion of a JavaScript source file within application HTML.
\item \texttt{serverOnly Module.ident} registers a value as being allowed only in server-side code.
\end{itemize}
\subsection{Writing C FFI Code}
C source files connecting to the Ur/Web FFI should include \texttt{urweb.h}, and C++ source files should include \texttt{urweb\_cpp.h}.
A server-side FFI type or value \texttt{Module.ident} must have a corresponding type or value definition \texttt{uw\_Module\_ident} in C code. With the current Ur/Web version, it's not generally possible to work with Ur records or complex datatypes in C code, but most other kinds of types are fair game.
\begin{itemize}
\item Primitive types defined in \texttt{Basis} are themselves using the standard FFI interface, so you may refer to them like \texttt{uw\_Basis\_t}. See \texttt{include/urweb/types.h} for their definitions.
\item Enumeration datatypes, which have only constructors that take no arguments, should be defined using C \texttt{enum}s. The type is named as for any other type identifier, and each constructor \texttt{c} gets an enumeration constant named \texttt{uw\_Module\_c}.
\item A datatype \texttt{dt} (such as \texttt{Basis.option}) that has one non-value-carrying constructor \texttt{NC} and one value-carrying constructor \texttt{C} gets special treatment. Where \texttt{T} is the type of \texttt{C}'s argument, and where we represent \texttt{T} as \texttt{t} in C, we represent \texttt{NC} with \texttt{NULL}. The representation of \texttt{C} depends on whether we're sure that we don't need to use \texttt{NULL} to represent \texttt{t} values; this condition holds only for strings and complex datatypes. For such types, \texttt{C v} is represented with the C encoding of \texttt{v}, such that the translation of \texttt{dt} is \texttt{t}. For other types, \texttt{C v} is represented with a pointer to the C encoding of v, such that the translation of \texttt{dt} is \texttt{t*}.
\item Ur/Web involves many types of program syntax, such as for HTML and SQL code. All of these types are implemented with normal C strings, and you may take advantage of that encoding to manipulate code as strings in C FFI code. Be mindful that, in writing such code, it is your responsibility to maintain the appropriate code invariants, or you may reintroduce the code injection vulnerabilities that Ur/Web rules out. The most convenient way to extend Ur/Web with functions that, e.g., use natively unsupported HTML tags is to generate the HTML code with the FFI.
\end{itemize}
The C FFI version of a Ur function with type \texttt{T1 -> ... -> TN -> R} or \texttt{T1 -> ... -> TN -> transaction R} has a C prototype like \texttt{R uw\_Module\_ident(uw\_context, T1, ..., TN)}. Only functions with types of the second form may have side effects. \texttt{uw\_context} is the type of state that persists across handling a client request. Many functions that operate on contexts are prototyped in \texttt{include/urweb/urweb\_cpp.h}. Most should only be used internally by the compiler. A few are useful in general FFI implementation:
\begin{itemize}
\item \begin{verbatim}
void uw_error(uw_context, failure_kind, const char *fmt, ...);
\end{verbatim}
Abort the current request processing, giving a \texttt{printf}-style format string and arguments for generating an error message. The \texttt{failure\_kind} argument can be \texttt{FATAL}, to abort the whole execution; \texttt{BOUNDED\_RETRY}, to try processing the request again from the beginning, but failing if this happens too many times; or \texttt{UNLIMITED\_RETRY}, to repeat processing, with no cap on how many times this can recur.
All pointers to the context-local heap (see description below of \texttt{uw\_malloc()}) become invalid at the start and end of any execution of a main entry point function of an application. For example, if the request handler is restarted because of a \texttt{uw\_error()} call with \texttt{BOUNDED\_RETRY} or for any other reason, it is unsafe to access any local heap pointers that may have been stashed somewhere beforehand.
\item \begin{verbatim}
void uw_set_error_message(uw_context, const char *fmt, ...);
\end{verbatim}
This simpler form of \texttt{uw\_error()} saves an error message without immediately aborting execution.
\item \begin{verbatim}
void uw_push_cleanup(uw_context, void (*func)(void *), void *arg);
void uw_pop_cleanup(uw_context);
\end{verbatim}
Manipulate a stack of actions that should be taken if any kind of error condition arises. Calling the ``pop'' function both removes an action from the stack and executes it. It is a bug to let a page request handler finish successfully with unpopped cleanup actions.
Pending cleanup actions aren't intended to have any complex relationship amongst themselves, so, upon request handler abort, pending actions are executed in first-in-first-out order.
\item \begin{verbatim}
void *uw_malloc(uw_context, size_t);
\end{verbatim}
A version of \texttt{malloc()} that allocates memory inside a context's heap, which is managed with region allocation. Thus, there is no \texttt{uw\_free()}, but you need to be careful not to keep ad-hoc C pointers to this area of memory. In general, \texttt{uw\_malloc()}ed memory should only be used in ways compatible with the computation model of pure Ur. This means it is fine to allocate and return a value that could just as well have been built with core Ur code. In contrast, it is almost never safe to store \texttt{uw\_malloc()}ed pointers in global variables, including when the storage happens implicitly by registering a callback that would take the pointer as an argument.
For performance and correctness reasons, it is usually preferable to use \texttt{uw\_malloc()} instead of \texttt{malloc()}. The former manipulates a local heap that can be kept allocated across page requests, while the latter uses global data structures that may face contention during concurrent execution. However, we emphasize again that \texttt{uw\_malloc()} should never be used to implement some logic that couldn't be implemented trivially by a constant-valued expression in Ur.
\item \begin{verbatim}
typedef void (*uw_callback)(void *);
typedef void (*uw_callback_with_retry)(void *, int will_retry);
int uw_register_transactional(uw_context, void *data, uw_callback commit,
uw_callback rollback, uw_callback_with_retry free);
\end{verbatim}
All side effects in Ur/Web programs need to be compatible with transactions, such that any set of actions can be undone at any time. Thus, you should not perform actions with non-local side effects directly; instead, register handlers to be called when the current transaction is committed or rolled back. The arguments here give an arbitary piece of data to be passed to callbacks, a function to call on commit, a function to call on rollback, and a function to call afterward in either case to clean up any allocated resources. A rollback handler may be called after the associated commit handler has already been called, if some later part of the commit process fails. A free handler is told whether the runtime system expects to retry the current page request after rollback finishes. The return value of \texttt{uw\_register\_transactional()} is 0 on success and nonzero on failure (where failure currently only happens when exceeding configured limits on number of transactionals).
Any of the callbacks may be \texttt{NULL}. To accommodate some stubbornly non-transactional real-world actions like sending an e-mail message, Ur/Web treats \texttt{NULL} \texttt{rollback} callbacks specially. When a transaction commits, all \texttt{commit} actions that have non-\texttt{NULL} rollback actions are tried before any \texttt{commit} actions that have \texttt{NULL} rollback actions. Furthermore, an SQL \texttt{COMMIT} is also attempted in between the two phases, so the nicely transactional actions have a chance to influence whether data are committed to the database, while \texttt{NULL}-rollback actions only get run in the first place after committing data. The reason for all this is that it is \emph{expected} that concurrency interactions will cause database commits to fail in benign ways that call for transaction restart. A truly non-undoable action should only be run after we are sure the database transaction will commit.
When a request handler ends with multiple pending transactional actions, their handlers are run in a first-in-last-out stack-like order, wherever the order would otherwise be ambiguous.
It is not safe for any of these handlers to access a context-local heap through a pointer returned previously by \texttt{uw\_malloc()}, nor should any new calls to that function be made. Think of the context-local heap as meant for use by the Ur/Web code itself, while transactional handlers execute after the Ur/Web code has finished.
A handler may signal an error by calling \texttt{uw\_set\_error\_message()}, but it is not safe to call \texttt{uw\_error()} from a handler. Signaling an error in a commit handler will cause the runtime system to switch to aborting the transaction, immediately after the current commit handler returns.
\item \begin{verbatim}
void *uw_get_global(uw_context, char *name);
void uw_set_global(uw_context, char *name, void *data, uw_callback free);
\end{verbatim}
Different FFI-based extensions may want to associate their own pieces of data with contexts. The global interface provides a way of doing that, where each extension must come up with its own unique key. The \texttt{free} argument to \texttt{uw\_set\_global()} explains how to deallocate the saved data. It is never safe to store \texttt{uw\_malloc()}ed pointers in global variable slots.
\end{itemize}
\subsection{Writing JavaScript FFI Code}
JavaScript is dynamically typed, so Ur/Web type definitions imply no JavaScript code. The JavaScript identifier for each FFI function is set with the \texttt{jsFunc} directive. Each identifier can be defined in any JavaScript file that you ask to include with the \texttt{script} directive, and one easy way to get code included is with the \texttt{jsFile} directive.
In contrast to C FFI code, JavaScript FFI functions take no extra context argument. Their argument lists are as you would expect from their Ur types. Only functions whose ranges take the form \texttt{transaction T} should have side effects; the JavaScript ``return type'' of such a function is \texttt{T}. Here are the conventions for representing Ur values in JavaScript.
\begin{itemize}
\item Integers, floats, strings, characters, and booleans are represented in the usual JavaScript way.
\item Ur functions are represented in an unspecified way. This means that you should not rely on any details of function representation. Named FFI functions are represented as JavaScript functions with as many arguments as their Ur types specify. To call a non-FFI function \texttt{f} on argument \texttt{x}, run \texttt{execF(f, x)}. A normal JavaScript function may also be used in a position where the Ur/Web runtime system expects an Ur/Web function.
\item An Ur record is represented with a JavaScript record, where Ur field name \texttt{N} translates to JavaScript field name \texttt{\_N}. An exception to this rule is that the empty record is encoded as \texttt{null}.
\item \texttt{option}-like types receive special handling similar to their handling in C. The ``\texttt{None}'' constructor is \texttt{null}, and a use of the ``\texttt{Some}'' constructor on a value \texttt{v} is either \texttt{v}, if the underlying type doesn't need to use \texttt{null}; or \texttt{\{v:v\}} otherwise.
\item Any other datatypes represent a non-value-carrying constructor \texttt{C} as \texttt{"C"} and an application of a constructor \texttt{C} to value \texttt{v} as \texttt{\{n:"C", v:v\}}. This rule only applies to datatypes defined in FFI module signatures; the compiler is free to optimize the representations of other, non-\texttt{option}-like datatypes in arbitrary ways.
\item As in the C FFI, all abstract types of program syntax are implemented with strings in JavaScript.
\item A value of Ur type \texttt{transaction t} is represented in the same way as for \texttt{unit -> t}. (Note that FFI functions skip this extra level of function encoding, which only applies to functions defined in Ur/Web.)
\end{itemize}
It is possible to write JavaScript FFI code that interacts with the functional-reactive structure of a document. Here is a quick summary of some of the simpler functions to use; descriptions of fancier stuff may be added later on request (and such stuff should be considered ``undocumented features'' until then).
\begin{itemize}
\item Sources should be treated as an abstract type, manipulated via:
\begin{itemize}
\item \cd{sc(v)}, to create a source initialized to \cd{v}
\item \cd{sg(s)}, to retrieve the current value of source \cd{s}
\item \cd{sv(s, v)}, to set source \cd{s} to value \cd{v}
\end{itemize}
\item Signals should be treated as an abstract type, manipulated via:
\begin{itemize}
\item \cd{sr(v)} and \cd{sb(s, f)}, the ``return'' and ``bind'' monad operators, respectively
\item \cd{ss(s)}, to produce the signal corresponding to source \cd{s}
\item \cd{scur(s)}, to get the current value of signal \cd{s}
\end{itemize}
\item The behavior of the \cd{} pseudo-tag may be mimicked by following the right convention in a piece of HTML source code with a type like $\mt{xbody}$. Such a piece of source code may be encoded with a JavaScript string. To insert a dynamic section, include a \cd{