From 1a1b6c960dc07a23a3b60236842a6ba0b4c528f9 Mon Sep 17 00:00:00 2001 From: Kristina Chodorow Date: Fri, 14 Oct 2016 14:29:21 +0000 Subject: Add design doc for recursive WORKSPACE file parsing #1943 -- Change-Id: Ic7b4378acb4d3661a1b61ff887c9cc8592df9f7e Reviewed-on: https://bazel-review.googlesource.com/c/6131/ MOS_MIGRATED_REVID=136154990 --- .../_posts/2016-09-19-recursive-ws-parsing.md | 199 +++++++++++++++++++++ 1 file changed, 199 insertions(+) create mode 100644 site/designs/_posts/2016-09-19-recursive-ws-parsing.md (limited to 'site/designs') diff --git a/site/designs/_posts/2016-09-19-recursive-ws-parsing.md b/site/designs/_posts/2016-09-19-recursive-ws-parsing.md new file mode 100644 index 0000000000..b161531fe3 --- /dev/null +++ b/site/designs/_posts/2016-09-19-recursive-ws-parsing.md @@ -0,0 +1,199 @@ +--- +layout: contribute +title: Recursive WORKSPACE file parsing +--- +# Recursive WORKSPACE file parsing + +**Status**: Unimplemented + +**Author**: [kchodorow@](mailto:kchodorow@google.com) + +## Objective + +Users are annoyed by having to specify all deps in their WORKSPACE file. To +avoid this inconvenience, Bazel could load subprojects' WORKSPACE files +automatically. + +## Non-Goals + +* Solve the problem of people specifying two different repositories with the + same name (e.g., @util). +* Solve the problem of people specifying two different names for the same + repository (`@guava` and `@com_google_guava`). + +## Resolution + +When a repository is defined in multiple files, which definition "wins"? What +causes a conflict/error? + +### Defined in the main repository's WORKSPACE file + +This definition wins, regardless of other definitions. + +### In a line + +Note: as an intermediate step, this can be disabled, but the end goal is to +allow this so that intermediate dependencies that the top level doesn't care +about don't need to be resolved. + +Suppose we have a main repository that depends on repo x, and x depends on repo +y: + + + +In this case, version 1 of "foo" wins. This way, if a library has already +figured out which version works for them, its reverse dependencies do not have +to think about it. + +This will also work if a parent overrides its children's versions, even if it +has multiple children. + +### Different lines + +If there is no obvious hierarchy and multiple versions are specified, error out. +Report what each chain of dependencies was that wanted the dep and at which +versions: + + + +In this case, Bazel would error out with: + +``` +ERROR: Conflicting definitions of 'foo': bazel-external/y/WORKSPACE:2 repository(name = 'foo' version = '1') + requested by bazel-external/x/WORKSPACE:2 repository(name = 'y') + requested by WORKSPACE:3 repository(name = 'x') +vs. bazel-external/a/WORKSPACE:2 repository(name = 'foo' version = '2') + requested by WORKSPACE:2 repository(name = 'a') +``` + +This is also the case with diamond dependencies: + + + +This would print: + +``` +ERROR: Conflicting definitions of 'foo': bazel-external/x/WORKSPACE:2 repository(name = 'foo' version = '2') + requested by WORKSPACE:2 repository(name = 'x') +vs. bazel-external/z/WORKSPACE:2 repository(name = 'foo' version = '1') + requested by bazel-external/y/WORKSPACE:2 repository(name = 'z') + requested by WORKSPACE:3 repository(name = 'y') +``` + +## Upgrade path + +I think that this should be fairly straightforward, as any repository used by +the main repository or any subrepository had to be declared in the WORKSPACE +already, so it will take precedence. + +To be extra safe, we can start with adding a `recursive = False` attribute to +the `workspace` rule, which we can then flip the default of. + +## Implementation + +There are two options for implementing this: + +* We always download/link every external dependency before a build can happen. + E.g., if @x isn't defined in the WORKSPACE file, we have to recursively + traverse all of the repositories to know which repository does define it and + if there are any conflicting definitions. This is correct, but will be + frustrating to users and may not even work in some cases (e.g., if an + OS-X-only skylark repository rule is fetched on Linux). +* Every time a new WORKSPACE file is fetched, we check its repository rules + against the ones already defined and look for version conflicts. This would + entirely miss certain version conflicts until certain dependencies are built, + but will have better performance. + +I think users will rebel unless we go with Option 2. However, this can have +some weird effects: suppose we have the diamond dependency above, and the user's +BUILD file contains: + +``` +cc_library( + name = "bar", + deps = ["@x//:dep"], # using @foo version 2 +) + +cc_library( + name = "baz", + deps = ["@y//:dep"], # using @foo version 1 +) +``` + +If they build :bar and their coworker builds :baz, the two builds will work and +get different versions of @foo. However, as soon as one of them tries to build +both, they'll get the version mismatch error. + +This is suboptimal, but I can't think of a way that all three of these can be +satisfied: + +* The user doesn't have to declare everything at the top level. +* Bazel doesn't have to load everything. +* Bazel can immediately detect any conflicts. + +This could be enforced by a CI on presubmit, which I think is good enough. + +Whenever Bazel creates a new repository, it will attempt to parse the WORKSPACE +file and do Skyframe lookups against each repository name. If the repository +name is not defined, it will be initialized to the current WORKSPACE's +definition. If it already exists, the existing value will be compared. + +For now, we'll be very picky about equality: `maven_jar` and `new_http_archive` +of the same Maven artifact will count as different repositories. For both +native and skylark repository rules, they will have to be equal to not conflict. + +One issue is that is a little tricky but I think will work out: the WORKSPACE +file is parsed incrementally. Suppose the main WORKSPACE loads x.bzl, which +declares @y and @z. If @y depends on @foo version 1 and @z depends on @foo +version 2, this will throw a Skyframe error, even if @foo is later declared in +the WORKSPACE file. However, this should be okay, because if these dependencies +actually need @foo, it would need to be declared before them in the WS file +already. + +## Supplementary changes + +Not strictly required, but as part of this I'm planning to implement: + +* A `bazel-external` convenience symlink (to the `[output_base]/external` + directory) so users can easily inspect their external repositories. +* Add an option to generate all WORKSPACE definitions (so generate a flat + WORKSPACE file from the hierarchy). + +## Concerns + +Questions users might have. + +*Where did @x come from?* + +Bazel will create a `bazel-external/@x.version` should contain the WORKSPACE (or +.bzl file) where we got @x's def and other WORKSPACE files that contain it. + +*Which version of @x is going to be chosen?* + +See resolution section above. Perhaps people could query for //external:x? + +*I want to use a different version of @x.* + +Declare @x in your WORKSPACE file, it'll override "lower" rules. + +*When I update @x, what else will change?* + +Because @x might declare repo @y and @y's version might change as well, we'd +need a different way to query for this. We could implement deps() for repo +rules or have some other mechanism for this. + +## Thoughts on future development + +Moving towards the user-as-conflict-resolver model (vs. the +user-as-transcriber-of-deps model) means that repositories that the user may not +even be aware of might be available in their workspace. I think this kind of +paves the way towards a nice auto-fetch system where a user could just depend on +`@com_google_guava//whatever` in their BUILD file, and Bazel could figure out +how to make `@com_google_guava` available. + +## References + +[So you want to write a package manager](https://medium.com/@sdboyer/so-you-want-to-write-a-package-manager-4ae9c17d9527#.d90oxolzk) +does a good job outlining many of these challenges, but their suggested approach +(use semantic versioning for dependency resolution) cannot be used by Bazel for +the general case. -- cgit v1.2.3