aboutsummaryrefslogtreecommitdiffhomepage
path: root/site/designs/_posts/2016-10-11-distribution-artifact.md
blob: a1bbd9acca1ce5baa8f3bd59b028f2edd607dbc0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
layout: contribute
title: Distribution Artifact for Bazel
---

# Design Document: Distribution Artifact for Bazel

**Design documents are not descriptions of the current functionality of Bazel.
Always go to the documentation for current information.**


**Status**: Implemented

**Author**: [Klaus Aehlig](mailto:aehlig@google.com)

**Design document published**: 11 October 2016

## Current State and Shortcomings


### Dependency on `protoc`

Bazel depends on a protobuffer compiler to generate code, especially
java code, from an abstract description of the protocol buffer;
in particular, files generated by `protoc` are machine-independent.
In fact, Bazel most of the time uses the latest version of `protoc`.
New versions of `protoc` that contain incompatible changes to the
programming interface are released frequently.

### Current approach to this dependency

The current approach to the `protoc` dependency is to have checked-in
statically-linked executables for all the supported platforms (where
some platforms, like FreeBSD, have to use Linux-compatibility features).
The full source tree of the protobuf compiler is also part of the repository.
However, for generating files, the committed binaries are always used.

### Shortcomings

The current approach as certain shortcomings.

- Having up-to-date binaries for all the supported platforms does not scale well
  as the number of platforms Bazel should run on is increasing.

- The requirement of having a suitable executable in the code base adds
  additional complexity to the process of bootstrapping a new architecture.

- Binaries in the code base do not follow standard open-source principles; in
  fact, meaningful reviews for changes updating them are hard and in practise
  often boil down to a question of trust in the person making the change.

- Committed binaries make the "source" repository unnecessary big. Currently,
  a checkout at head contains over 250MB in committed `.exe` and `.dll` files.

## Proposed solution

### Change `BUILD` to compile `protoc` from source

This `BUILD` file for the `third_party/protobuf` is changed in such a
way, that the `protoc` is compiled from source instead of selecting from
the committed pre-built binaries; the pre-built binaries are removed from
the source tree. As the `protoc` sources are already part of the repository,
this is not a huge change; also, as `protoc` is written in `C++`, no additional
dependencies are introduced that way.

Note that then, every user who already has a working (bootstrap) `bazel`, can
build bazel from source, without depending on committed binaries or having
a `protoc` already on the machine. The problem of building your first `bazel`
will be addressed in the next sections.

This change also removes an internal consistency requirement from the code
base. It was always assumed that the binaries actually match the accompanying
sources.

### Distribution artifact

A new target `//:bazel-distfile` will be added. This will be an archive
containing

- all source files in their respective places, including the files
  under `third_party`, `site`, `scripts`, etc, as well as

- under a subdirectory `derived` all the files generated by `protoc` that
  are needed to compile a bootstrap version of `bazel`.

For convenience, the `derived` subdirectory may also contain other
generated architecture-independent files, like an HTML-version of the
documentation for local browsing. A corrollary of the archive layout is that
by removing the `derived` directory a checkout of the upstream sources is
obtained.

This new artifact will be built for every release and made available together
along with the other release artifacts (like packages, installers, executables).
The same means of certifying integrity (like hashes, SSL-certificates) will be
used.

### Bootstrapping Bazel

The `compile.sh` will be modified to first check if a `derived` directory exists
and if this is the case assume that all the files generated by `protoc` are
already present there; only if not present, it will try to generate the needed
output of `protoc` for bootstrapping, assuming that the `PROTOC` environment
variable points to a good `protoc` binary.

So, there will be three ways to build `bazel`.

- If one has an old `bazel` binary already, a new one can be built from a
  checkout of the source repository. This approach is useful for developpers.
  It might also be used by users who want to upgrade their old `bazel` binary
  to the next release.

- By downloading the distribution artifact, the `compile.sh` script can be
  used to build bazel. Again, no `protoc` has to be installed ahead of time.
  This approach is useful for source distributions, as well as for bringing
  Bazel to a new platform.

- If one already has the correct version of `protoc` on the machine, the
  `compile.sh` script can be used by setting the `PROTOC` environment variable.
  This approach is useful for distributions that want to provide snapshots
  of `bazel` inbetween official releases and maintain a `protoc` package anyway.

## Other approaches considered

### Requiring users to have the correct version of the `protoc` binary installed

This would be the standard open-source approach of requiring the user to have
the required dependencies installed ahead of time. Unfortunately, `protoc`
contains incompatible changes too frequently, so that this would be an
unreasonable
burden. Note that the bootstrapping from your own `protoc` and a repository
checkout is still possible with the suggested approach.

### Committing the `protoc` output

Another approach would be to make the output of `protoc` part of the versioned
sources instead of generating them for the distribution file. As with all
approaches based on committing generated files, this would
introduce another consistency requirement to the repository. In this case, the
requirement would be that the generated files be up-to-date with respect to the
respective `.proto` files. Of course, such a consistency could be verified by
an appropriate test. Nevertheless, it seems the cleaner and probably more
managable to only version true source files and generate derived files from
the respective sources.