diff options
author | Laurent Le Brun <laurentlb@google.com> | 2017-03-21 11:44:04 +0000 |
---|---|---|
committer | Yue Gan <yueg@google.com> | 2017-03-21 12:55:18 +0000 |
commit | e29f6144f0b04ef233461911dbdfd9d75c0f83bb (patch) | |
tree | ee5bdcd65367758d5902c619de4527fb41bc7ddb /site/blog | |
parent | a09d876fa166bbc856716eaf9a2ef06d09a9c12c (diff) |
Blog post about the design of Skylark.
--
PiperOrigin-RevId: 150738983
MOS_MIGRATED_REVID=150738983
Diffstat (limited to 'site/blog')
-rw-r--r-- | site/blog/_posts/2017-03-21-design-of-skylark.md | 96 |
1 files changed, 96 insertions, 0 deletions
diff --git a/site/blog/_posts/2017-03-21-design-of-skylark.md b/site/blog/_posts/2017-03-21-design-of-skylark.md new file mode 100644 index 0000000000..849da98cf9 --- /dev/null +++ b/site/blog/_posts/2017-03-21-design-of-skylark.md @@ -0,0 +1,96 @@ +--- +layout: posts +title: A glimpse of the design of Skylark +--- + +This blog post describes the design of Skylark, the language used to specify +builds in Bazel. + +## A brief history + +Many years ago, code at Google was built using Makefiles. As [other people +noticed](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/03/hadrian.pdf), +Makefiles don't scale well with a large code base. A temporary solution was to +generate Makefiles using Python scripts, where the description of the build was +stored in `BUILD` files containing calls to the Python functions. But this +solution was way too slow, and the bottleneck was Make. + +The project Blaze (later open-sourced as Bazel) was started in 2006. It used a +simple parser to read the `BUILD` files (supporting only function calls, list +comprehensions and variable assignments). When Blaze could not directly parse a +`BUILD` file, it used a preprocessing step that ran the Python interpreter on +the user `BUILD` file to generate a simplified `BUILD` file. The output was used +by Blaze. + +This approach was simple and allowed developers to create their own macros. But +again, this led to lots of problems in terms of maintenance, performance, and +safety. It also made any kind of tooling more complicated, as Blaze was not able +to parse the `BUILD` files itself. + +In the current iteration of Bazel, we've made the system saner by removing the +Python preprocessing step. We kept the Python syntax, though, in order to +migrate our codebase. This seems to be a good idea anyway: Many people like the +syntax of our `BUILD` files and other build tools (e.g. +[Buck](https://buckbuild.com/concept/build_file.html), +[Pants](http://www.pantsbuild.org/build_files.html), and +[Please](https://please.build/language.html)) have adopted it. + +## Design requirements + +We decided to separate description of the build from the extensions (macros and +rules). The description of the build resides in `BUILD` files and the extensions +reside in `.bzl` files, although they are all evaluated with the same +interpreter. We want the code to be easy to read and maintain. We designed Bazel +to be used by thousands of engineers. Most of them are not familiar with build +systems internals and most of them don't want to spend time learning a new +language. `BUILD` files need to be simple and declarative, so that we can build +tools to manipulate them. + +The language also needed to: + +* Run on the JVM. Bazel is written in Java. The data structures should be + shared between Bazel and the language (due to memory requirements in large + builds). + +* Use a Python syntax, to preserve our codebase. + +* Be deterministic and hermetic. We have to guarantee that the execution of + the code will always yield the same results. For example, we forbid access + to I/O and date and time, and ensure deterministic iteration order of + dictionaries. + +* Be thread-safe. We need to evaluate a lot of `BUILD` files in parallel. + Execution of the code needs to be thread-safe in order to guarantee + determinism. + +Finally, we have performance concerns. A typical `BUILD` file is simple and can +be executed quickly. In most cases, evaluating the code directly is faster than +compiling it first. + +## Parallelism and imports + +One special feature of Skylark is how it handles parallelism. In Bazel, a large +build require the evaluation of hundreds of `BUILD` files, so we have to load +them in parallel. Each `BUILD` file may use any number of extensions, and those +extensions might need other files as well. This means that we end up with a +graph of dependencies. + +Bazel first evaluates the leaves of this graph (i.e. the files that have no +dependencies) in parallel. It will load the other files as soon as their +dependencies have been loaded, which means the evaluation of `BUILD` and `.bzl` +files is interleaved. This also means that the order of the `load` statements +doesn't matter at all. + +Each file is loaded at most once. Once it has been evaluated, its definitions +(the global variables and functions) are cached. Any other file can access the +symbols through the cache. + +Since multiple threads can access a variable at the same time, we need a +restriction on side-effects to guarantee thread-safety. The solution is simple: +when we cache the definitions of a file, we "freeze" them. We make them +read-only, i.e. you can iterate on an array, but not modify its elements. You +may create a copy and modify it, though. + +In a future blog post, we'll take a look at the other features of the language. + +_By [Laurent Le Brun](https://github.com/laurentlb)_ |