\documentclass{article} \usepackage{fullpage,amsmath,amssymb,proof,url} \usepackage[T1]{fontenc} \usepackage{ae,aecompl} \newcommand{\cd}[1]{\texttt{#1}} \newcommand{\mt}[1]{\mathsf{#1}} \newcommand{\rc}{+ \hspace{-.075in} + \;} \newcommand{\rcut}{\; \texttt{-{}-} \;} \newcommand{\rcutM}{\; \texttt{-{}-{}-} \;} \usepackage{hyperref} \begin{document} \title{The Ur/Web Manual} \author{Adam Chlipala} \maketitle \tableofcontents \section{Introduction} \emph{Ur} is a programming language designed to introduce richer type system features into functional programming in the tradition of ML and Haskell. Ur is functional, pure, statically typed, and strict. Ur supports a powerful kind of \emph{metaprogramming} based on \emph{type-level computation with type-level records}. \emph{Ur/Web} is Ur plus a special standard library and associated rules for parsing and optimization. Ur/Web supports construction of dynamic web applications backed by SQL databases. The signature of the standard library is such that well-typed Ur/Web programs ``don't go wrong'' in a very broad sense. Not only do they not crash during particular page generations, but they also may not: \begin{itemize} \item Suffer from any kinds of code-injection attacks \item Return invalid HTML \item Contain dead intra-application links \item Have mismatches between HTML forms and the fields expected by their handlers \item Include client-side code that makes incorrect assumptions about the ``AJAX''-style services that the remote web server provides \item Attempt invalid SQL queries \item Use improper marshaling or unmarshaling in communication with SQL databases or between browsers and web servers \end{itemize} This type safety is just the foundation of the Ur/Web methodology. It is also possible to use metaprogramming to build significant application pieces by analysis of type structure. For instance, the demo includes an ML-style functor for building an admin interface for an arbitrary SQL table. The type system guarantees that the admin interface sub-application that comes out will always be free of the above-listed bugs, no matter which well-typed table description is given as input. The Ur/Web compiler also produces very efficient object code that does not use garbage collection. These compiled programs will often be even more efficient than what most programmers would bother to write in C. The compiler also generates JavaScript versions of client-side code, with no need to write those parts of applications in a different language. \medskip The official web site for Ur is: \begin{center} \url{http://www.impredicative.com/ur/} \end{center} \section{Installation} If you are lucky, then the following standard command sequence will suffice for installation, in a directory to which you have unpacked the latest distribution tarball. \begin{verbatim} ./configure make sudo make install \end{verbatim} Some other packages must be installed for the above to work. At a minimum, you need a standard UNIX shell, with standard UNIX tools like sed and GCC (or an alternate C compiler) in your execution path; MLton, the whole-program optimizing compiler for Standard ML; and the development files for the OpenSSL C library and the ICU C library. As of this writing, in the ``testing'' version of Debian Linux, this command will install the more uncommon of these dependencies: \begin{verbatim} apt-get install mlton libssl-dev libicu-dev \end{verbatim} Note that, like the Ur/Web compiler, MLton is a whole-program optimizing compiler, so it frequently requires much more memory than old-fashioned compilers do. Expect building Ur/Web with MLton to require not much less than a gigabyte of RAM. If a \texttt{mlton} invocation ends suspiciously, the most likely explanation is that it has exhausted available memory. To build programs that access SQL databases, you also need one of these client libraries for supported backends. \begin{verbatim} apt-get install libpq-dev libmysqlclient-dev libsqlite3-dev \end{verbatim} It is also possible to access the modules of the Ur/Web compiler interactively, within Standard ML of New Jersey. To install the prerequisites in Debian testing: \begin{verbatim} apt-get install smlnj libsmlnj-smlnj ml-yacc ml-lpt \end{verbatim} To begin an interactive session with the Ur compiler modules, run \texttt{make smlnj}, and then, from within an \texttt{sml} session, run \texttt{CM.make "src/urweb.cm";}. The \texttt{Compiler} module is the main entry point, and you can find its signature in \texttt{src/compiler.sig}. To run an SQL-backed application with a backend besides SQLite, you will probably want to install one of these servers. \begin{verbatim} apt-get install postgresql mysql-server \end{verbatim} To use the Emacs mode, you must have a modern Emacs installed. We assume that you already know how to do this, if you're in the business of looking for an Emacs mode. The demo generation facility of the compiler will also call out to Emacs to syntax-highlight code, and that process depends on the \texttt{htmlize} module, which can be installed in Debian testing via: \begin{verbatim} apt-get install emacs-goodies-el \end{verbatim} If you don't want to install the Emacs mode, run \texttt{./configure} with the argument \texttt{--without-emacs}. Even with the right packages installed, configuration and building might fail to work. After you run \texttt{./configure}, you will see the values of some named environment variables printed. You may need to adjust these values to get proper installation for your system. To change a value, store your preferred alternative in the corresponding UNIX environment variable, before running \texttt{./configure}. For instance, here is how to change the list of extra arguments that the Ur/Web compiler will pass to the C compiler and linker on every invocation. Some older GCC versions need this setting to mask a bug in function inlining. \begin{verbatim} CCARGS=-fno-inline ./configure \end{verbatim} Since the author is still getting a handle on the GNU Autotools that provide the build system, you may need to do some further work to get started, especially in environments with significant differences from Linux (where most testing is done). The variables \texttt{PGHEADER}, \texttt{MSHEADER}, and \texttt{SQHEADER} may be used to set the proper C header files to include for the development libraries of PostgreSQL, MySQL, and SQLite, respectively. To get libpq to link, one OS X user reported setting \texttt{CCARGS="-I/opt/local/include -L/opt/local/lib/postgresql84"}, after creating a symbolic link with \texttt{ln -s /opt/local/include/postgresql84 /opt/local/include/postgresql}. The Emacs mode can be set to autoload by adding the following to your \texttt{.emacs} file. \begin{verbatim} (add-to-list 'load-path "/usr/local/share/emacs/site-lisp/urweb-mode") (load "urweb-mode-startup") \end{verbatim} Change the path in the first line if you chose a different Emacs installation path during configuration. \section{Command-Line Compiler} \subsection{\label{cl}Project Files} The basic inputs to the \texttt{urweb} compiler are project files, which have the extension \texttt{.urp}. Here is a sample \texttt{.urp} file. \begin{verbatim} database dbname=test sql crud1.sql crud crud1 \end{verbatim} The \texttt{database} line gives the database information string to pass to libpq. In this case, the string only says to connect to a local database named \texttt{test}. The \texttt{sql} line asks for an SQL source file to be generated, giving the commands to run to create the tables and sequences that this application expects to find. After building this \texttt{.urp} file, the following commands could be used to initialize the database, assuming that the current UNIX user exists as a Postgres user with database creation privileges: \begin{verbatim} createdb test psql -f crud1.sql test \end{verbatim} A blank line separates the named directives from a list of modules to include in the project. Any line may contain a shell-script-style comment, where any suffix of a line starting at a hash character \texttt{\#} is ignored. For each entry \texttt{M} in the module list, the file \texttt{M.urs} is included in the project if it exists, and the file \texttt{M.ur} must exist and is always included. Here is the complete list of directive forms. ``FFI'' stands for ``foreign function interface,'' Ur's facility for interaction between Ur programs and C and JavaScript libraries. \begin{itemize} \item \texttt{[allow|deny] [url|mime|requestHeader|responseHeader|env|meta] PATTERN} registers a rule governing which URLs, MIME types, HTTP request headers, HTTP response headers, environment variable names, or HTML \texttt{} names are allowed to appear explicitly in this application. The first such rule to match a name determines the verdict. If \texttt{PATTERN} ends in \texttt{*}, it is interpreted as a prefix rule. Otherwise, a string must match it exactly. \item \texttt{alwaysInline PATH} requests that every call to the referenced function be inlined. Section \ref{structure} explains how functions are assigned path strings. \item \texttt{benignEffectful Module.ident} registers an FFI function or transaction as having side effects. The optimizer avoids removing, moving, or duplicating calls to such functions. Every effectful FFI function must be registered, or the optimizer may make invalid transformations. This version of the \texttt{effectful} directive registers that this function only has side effects that remain local to a single page generation. \item \texttt{clientOnly Module.ident} registers an FFI function or transaction that may only be run in client browsers. \item \texttt{clientToServer Module.ident} adds FFI type \texttt{Module.ident} to the list of types that are OK to marshal from clients to servers. Values like XML trees and SQL queries are hard to marshal without introducing expensive validity checks, so it's easier to ensure that the server never trusts clients to send such values. The file \texttt{include/urweb/urweb\_cpp.h} shows examples of the C support functions that are required of any type that may be marshalled. These include \texttt{attrify}, \texttt{urlify}, and \texttt{unurlify} functions. \item \texttt{coreInline TREESIZE} sets how many nodes the AST of a function definition may have before the optimizer stops trying hard to inline calls to that function. (This is one of two options for one of two intermediate languages within the compiler.) \item \texttt{database DBSTRING} sets the string to pass to libpq to open a database connection. \item \texttt{debug} saves some intermediate C files, which is mostly useful to help in debugging the compiler itself. \item \texttt{effectful Module.ident} registers an FFI function or transaction as having side effects. The optimizer avoids removing, moving, or duplicating calls to such functions. This is the default behavior for \texttt{transaction}-based types. \item \texttt{exe FILENAME} sets the filename to which to write the output executable. The default for file \texttt{P.urp} is \texttt{P.exe}. \item \texttt{file URI FILENAME} asks for the application executable to respond to requests for \texttt{URI} by serving a snapshot of the contents of \texttt{FILENAME} as of compile time. That is, the file contents are baked into the executable. System file \texttt{/etc/mime.types} is consulted (again, at compile time) to figure out the right MIME type to suggest in the HTTP response. \item \texttt{file URI FILENAME MIME-TYPE} works like the simpler form of \texttt{file}, but the proper MIME type for the file is given directly. \item \texttt{filecache PATH} sets a path to a directory to use for caching files stored in the SQL database. It can be expensive to schlep files back and forth between the database and an Ur/Web application, since database engines don't tend to be optimized for transferring large files. Ur/Web will still store the files in the database, as the ``version of record'' for your whole, consistent data set, but the application will try to query the database only in terms of cryptographic hashes, from which files can be retrieved from the cache. (This feature is currently only available for PostgreSQL, with the module \texttt{pgcrypto} installed, to drive SHA512 hashing. It would defeat the purpose to run the hashing operation in the application rather than the database engine!) \item \texttt{ffi FILENAME} reads the file \texttt{FILENAME.urs} to determine the interface to a new FFI module. The name of the module is calculated from \texttt{FILENAME} in the same way as for normal source files. See the files \texttt{include/urweb/urweb\_cpp.h} and \texttt{src/c/urweb.c} for examples of C headers and implementations for FFI modules. In general, every type or value \texttt{Module.ident} becomes \texttt{uw\_Module\_ident} in C. \item \texttt{html5} asks to generate HTML5 code, which primarily affects the first few lines of the output documents, like the \texttt{DOCTYPE}. This option is on by default. \item \texttt{include FILENAME} adds \texttt{FILENAME} to the list of files to be \texttt{\#include}d in C sources. This is most useful for interfacing with new FFI modules. \item \texttt{jsFile FILENAME} asks to serve the contents of a file as JavaScript. All such content is concatenated into a single file, included via a \texttt{}. \subsubsection{Node IDs} There is an abstract type of node IDs that may be assigned to \cd{id} attributes of most HTML tags. $$\begin{array}{l} \mt{type} \; \mt{id} \\ \mt{val} \; \mt{fresh} : \mt{transaction} \; \mt{id} \end{array}$$ The \cd{fresh} function is allowed on both server and client, but there is no other way to create IDs, which includes lack of a way to force an ID to match a particular string. The main semantic importance of IDs within Ur/Web is in uses of the HTML \cd{