aboutsummaryrefslogtreecommitdiffhomepage
diff options
context:
space:
mode:
authorGravatar Klaus Aehlig <aehlig@google.com>2016-10-12 08:11:45 +0000
committerGravatar Yue Gan <yueg@google.com>2016-10-12 08:57:14 +0000
commit1b4b3417bd05b6163824b7147e333f46686ca2be (patch)
tree607d95cc222c08384cadc0c5cc8777a670308618
parent629084f5637d0b0a4ea10772e05c681881f511fa (diff)
Add a design document on a distribution artifact for bazel
With bazel sources depending on checked-in binaries for the supported platforms, porting bazel to a new platform is hard; that approach also doesn't scale well if we want to support users on more platforms. This design document suggests an alternative approach based on creating a zip-archive that, besides all sources, also contains the generated output of the protoc actions. From this archive, bazel can be bootstrapped without depending on the right protoc on the target platform. -- Change-Id: I401452435ed4189ea95260961d981ccc99a98560 Reviewed-on: https://bazel-review.googlesource.com/#/c/6530 MOS_MIGRATED_REVID=135891242
-rw-r--r--site/designs/_posts/2016-10-11-distribution-artifact.md137
1 files changed, 137 insertions, 0 deletions
diff --git a/site/designs/_posts/2016-10-11-distribution-artifact.md b/site/designs/_posts/2016-10-11-distribution-artifact.md
new file mode 100644
index 0000000000..2774dcf02a
--- /dev/null
+++ b/site/designs/_posts/2016-10-11-distribution-artifact.md
@@ -0,0 +1,137 @@
+---
+layout: contribute
+title: Distribution Artifact for Bazel
+---
+
+# Distribution Artifact for Bazel
+
+**Status**: Not implemented
+
+**Author**: [Klaus Aehlig](mailto:aehlig@google.com)
+
+## Current State and Shortcomings
+
+
+### Dependency on `protoc`
+
+Bazel depends on a protobuffer compiler to generate code, especially
+java code, from an abstract description of the protocol buffer;
+in particular, files generated by `protoc` are machine-independent.
+In fact, Bazel most of the time uses the latest version of `protoc`.
+New versions of `protoc` that contain incompatible changes to the
+programming interface are released frequently.
+
+### Current approach to this dependency
+
+The current approach to the `protoc` dependency is to have checked-in
+statically-linked executables for all the supported platforms (where
+some platforms, like FreeBSD, have to use Linux-compatibility features).
+The full source tree of the protobuf compiler is also part of the repository.
+However, for generating files, the committed binaries are always used.
+
+### Shortcomings
+
+The current approach as certain shortcomings.
+
+- Having up-to-date binaries for all the supported platforms does not scale well
+ as the number of platforms Bazel should run on is increasing.
+
+- The requirement of having a suitable executable in the code base adds
+ additional complexity to the process of bootstrapping a new architecture.
+
+- Binaries in the code base do not follow standard open-source principles; in
+ fact, meaningful reviews for changes updating them are hard and in practise
+ often boil down to a question of trust in the person making the change.
+
+- Committed binaries make the "source" repository unnecessary big. Currently,
+ a checkout at head contains over 250MB in committed `.exe` and `.dll` files.
+
+## Proposed solution
+
+### Change `BUILD` to compile `protoc` from source
+
+This `BUILD` file for the `third_party/protobuf` is changed in such a
+way, that the `protoc` is compiled from source instead of selecting from
+the committed pre-built binaries; the pre-built binaries are removed from
+the source tree. As the `protoc` sources are already part of the repository,
+this is not a huge change; also, as `protoc` is written in `C++`, no additional
+dependencies are introduced that way.
+
+Note that then, every user who already has a working (bootstrap) `bazel`, can
+build bazel from source, without depending on committed binaries or having
+a `protoc` already on the machine. The problem of building your first `bazel`
+will be addressed in the next sections.
+
+This change also removes an internal consistency requirement from the code
+base. It was always assumed that the binaries actually match the accompanying
+sources.
+
+### Distribution artifact
+
+A new target `//:bazel-distfile` will be added. This will be an archive
+containing
+
+- all source files in their respective places, including the files
+ under `third_party`, `site`, `scripts`, etc, as well as
+
+- under a subdirectory `derived` all the files generated by `protoc` that
+ are needed to compile a bootstrap version of `bazel`.
+
+For convenience, the `derived` subdirectory may also contain other
+generated architecture-independent files, like an HTML-version of the
+documentation for local browsing. A corrollary of the archive layout is that
+by removing the `derived` directory a checkout of the upstream sources is
+obtained.
+
+This new artifact will be built for every release and made available together
+along with the other release artifacts (like packages, installers, executables).
+The same means of certifying integrity (like hashes, SSL-certificates) will be
+used.
+
+### Bootstrapping Bazel
+
+The `compile.sh` will be modified to first check if a `derived` directory exists
+and if this is the case assume that all the files generated by `protoc` are
+already present there; only if not present, it will try to generate the needed
+output of `protoc` for bootstrapping, assuming that the `PROTOC` environment
+variable points to a good `protoc` binary.
+
+So, there will be three ways to build `bazel`.
+
+- If one has an old `bazel` binary already, a new one can be built from a
+ checkout of the source repository. This approach is useful for developpers.
+ It might also be used by users who want to upgrade their old `bazel` binary
+ to the next release.
+
+- By downloading the distribution artifact, the `compile.sh` script can be
+ used to build bazel. Again, no `protoc` has to be installed ahead of time.
+ This approach is useful for source distributions, as well as for bringing
+ Bazel to a new platform.
+
+- If one already has the correct version of `protoc` on the machine, the
+ `compile.sh` script can be used by setting the `PROTOC` environment variable.
+ This approach is useful for distributions that want to provide snapshots
+ of `bazel` inbetween official releases and maintain a `protoc` package anyway.
+
+## Other approaches considered
+
+### Requiring users to have the correct version of the `protoc` binary installed
+
+This would be the standard open-source approach of requiring the user to have
+the required dependencies installed ahead of time. Unfortunately, `protoc`
+contains incompatible changes too frequently, so that this would be an
+unreasonable
+burden. Note that the bootstrapping from your own `protoc` and a repository
+checkout is still possible with the suggested approach.
+
+### Committing the `protoc` output
+
+Another approach would be to make the output of `protoc` part of the versioned
+sources instead of generating them for the distribution file. As with all
+approaches based on committing generated files, this would
+introduce another consistency requirement to the repository. In this case, the
+requirement would be that the generated files be up-to-date with respect to the
+respective `.proto` files. Of course, such a consistency could be verified by
+an appropriate test. Nevertheless, it seems the cleaner and probably more
+managable to only version true source files and generate derived files from
+the respective sources.