aboutsummaryrefslogtreecommitdiffhomepage
path: root/site/designs/_posts/2016-06-21-environment.md
blob: bf1e445b9ca5540e2668634ebf197ff107913fbd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
---
layout: contribute
title: Specifying environment variables
---

# Design Document: Specifying environment variables for actions

**Design documents are not descriptions of the current functionality of Bazel.
Always go to the documentation for current information.**


**Status**: Implemented. See [documentation](/docs/skylark/lib/ctx.html#action)

**Author**: [Klaus Aehlig](mailto:aehlig@google.com)

**Design document published**: 21 June 2016

## Current shortcomings

Currently, Bazel provides a cleaned set of environment variables to the
actions in order to obtain hermetic builds. This, however is not sufficient
for all use cases.

* Projects often want to use tools which are not part of the repository; however,
  their location varies from installation to installation. So, some sensible
  value for the `PATH` environment variable has to be set.

* Some set-ups depend on every program having access to specific variables,
  e.g., indicating the homebrew paths, or library paths.

* Commercial compilers sometimes need to be passed the location of a license
  server through the environment.

## Proposed solution

### New flag `--action_env`

We propose to add a new bazel flag, `--action_env` which has two
valid forms of usage,

* specifying a variable with unspecified value, `--action_env=VARIABLE`,
  and

* specifying a variable with a value, `--action_env=VARIABLE=VALUE`;
  in the latter case, the value can well be the empty string, but it is still
  considered a specified value.

This flag has a "latest wins" semantics in the sense that if the option is given
twice for the same variable, only the latest option will be used, regardless
whether specified or unspecified value. Options given for different variables
accumulate.

In every action executed
with [`use_default_shell_env`] (/docs/skylark/lib/ctx.html#action) being true,
precisely the environment variables specified by
`--action_env` options are set as the default environment.
(Note that, therefore, by default, the environment for actions is empty.)

* If the effective option for a variable has an unspecified value,
  the value from the invocation environment of Bazel is taken.

* If the effective option for a variable specifies a value, this value is
  taken, regardless of the environment in which Bazel is invoked.

Environment variables are considered an essential part of an action. In other
words, an action is expected to produce a different output, if the environment
it is invoked in differs; in particular, a previously cached value cannot be
taken if the effective environment changes.

Given that normally a rule writer cannot know which tools might need fancy
environment variables (think of the commercial compiler use case), the default
for the [`use_default_shell_env`] (/docs/skylark/lib/ctx.html#action)
parameter will become true.

### List of rc-files read by Bazel

The list of rc-files that Bazel takes options from will include, at
least, the following files, where files later in the list take precedence over
the ones earlier in the list for conflicting options; for the
`--action_env` option the already described "latest wins" semantics is
applied.

* A global rc-file. This file typically contains defaults for a whole group of
  machines, like all machines of a company. On UNIX-like systems, it will be
  located at `/etc/bazel.bazelrc`.

* A machine-wide rc-file. This file is typically set by the administrator of
  the machine or a group of machines with the same architecture. It typically
  contains settings that are specific to that architecture and hardware.
  On UNIX-like systems it will be next to be binary and called like the binary
  with `.bazelrc` appended to the file name.

* A user-specific file, located in `~/.bazelrc`. This file will be set by
  each user for options desired for all Bazel invocations.

* A project-specific file. This is the file `tools/bazel.rc` next to
  the `WORKSPACE` file. This file is considered project-specific and
  typically versioned in the same repository as the project.

* A file specific to user, project, and checkout. This is the file
  `.bazelrc` next to the `WORKSPACE` file. As it is specific to
  the user and the machine he or she is working on, projects are advised
  to ignore that file in the repository of the project (e.g., by adding
  it to their `.gitignore` file, if they version the project with git).

When looking for those rc-files, symbolic links are followed; files not
existing are silently assumed to be empty. Note that all those are regular
rc-files for Bazel, hence are not limited to the newly introduced
`--action_env` option. Also, the rule that options for more specific
invocations win over common options still applies; but, within each level of
specificness, precedence is given according to the mentioned order of rc-files.

## Example usages of environment specifications

The proposed solution allows for a variety of use cases, including the
following.

* Systems using commercial compilers can set the environment variables with
  information about the license server in the global rc file.

* Users requiring special variables, like the ones used by homebrew, can set
  them in their machine specific rc-file. In fact, once this proposal is
  implemented, the homebrew port for Bazel could itself install that
  machine-wide rc-file.

* Projects depending on the environment, e.g., because they use tools assumed to
  be already installed on the user's systm, have several options.

  * If they are optimistic about the environment, e.g., because they are not
    very version dependent on the tools used, can just specify which environment
    variables they depend on by adding declarations with unspecified values in
    the `tools/bazel.rc` file.

  * If dependencies are more delicate, projects can provide a configure script
    that does whatever analysis of the environment is necessary and then write
    `--action_env` options with specified values to the user-project
    local `.bazelrc`
    file. As the configure script will only run when manually invoked by the
    user and the syntax of the user-project local `.bazelrc` file is so that it
    can be easily
    be edited by a human, it is OK if that script only works in the majority of
    the cases, as a user requiring an unusual setup for that project can easily
    modify the user-project local `.bazelrc` by hand afterwards.

* Irrespectively of the approach chosen by the project, a user where the
  environment changes frequently (e.g., on clusters or other machines using a
  traditional layout) can fix the environment by adding `--action_env`
  options with specific values to the user-project local `.bazelrc`.

  To simplify this use case, and other "freeze on first use" approaches,
  Bazel's `info` command will provide a new key `client-env` that will show
  the environment variables, together with their values. More precisely,
  each variable-value pair will be prefixed with `build --action_env=`, so
  that `bazel info client-env >> .bazelrc` can be used to freeze the
  environment.

## Transition plan

Currently, some users of Bazel already make use of the fact that `PATH`,
`LD_LIBRARY_PATH`, and `TMPDIR` are being passed to actions. To allow those
projects a smooth
transition to the new set up, the global Bazel rc-file provided by upstream
will have the following content.

```
build --action_env=PATH
build --action_env=LD_LIBRARY_PATH
build --action_env=TMPDIR
build --test_env=PATH
build --test_env=LD_LIBRARY_PATH
```


## Bazel's own dependency on `PATH`

Bazel itself also uses external tools, like `cat`, `echo`, `sh`, but also
tools like `bash` where the location differs between installations. In
particular, a value for `PATH` needs to be provided. This will be covered
by the setting of the global bazel configuration file. Should the need arise, a
configure-like script can be added; at the moment it seems that this will not
be necessary.

## Reasons for the Design Choices, Risks, and Alternatives Considered

### Conflicting Interests on the environment influencing actions

There are conflicting requirements for the environment variables of an action.

* Users expect Bazel to "just work", i.e., the expectation is that if a tool
  works on the command line, it should also work when called from an action in
  a Bazel invocation from the same environment. A lot of compilers, however,
  depend, at least on some systems, on certain environment variables.
  An approach used by quite a few other build systems is to pass through the
  whole invocation environment.

* Bazel wants to provide correct and reproducible builds. Therefore, everything
  that potentially influences the outcome of an action needs to be controlled
  and tracked; a cached result cannot be used if anything potentially changing
  the outcome has changed.

* Users expect Bazel to not do rebuilds they (i.e., the users) know are
  unnecessary. And, while for a lot of users the environment variables that
  actually influence the build stay stable, the full environment constantly
  changes; take the `OLDPWD` environment variable as an example.

This design tries to reconcile these needs by allowing arbitrary environment
variables being set for actions, but only in an opt-in way. Variables need to
be explicitly mentioned, either in a configuration file or on the command line,
to be provided to an action.

### Generic Solutions versus Special Casing

As Bazel already has quite a number of concepts, there is the valid concern
that the complexity might increase too much and newly added concepts might
become a maintenance burden. Another concern is that more configuration
mechanisms make it harder for the user to know which one is the correct one
to use for his or her problem. The general desire is to have few, but powerful
enough mechanisms to control the build behaviour and avoid special casing.

* Putting the environment variables visible in actions in the hand of the
  user avoids the need of special casing more and more "important" environment
  variables.

* Building on the already existing mechanism to specify, inherit, and override
  command-line options reduces the amount newly introduced concepts. The main
  addition is a command-line option.

### Source of Knowledge for Needed Environment Variables

Another aspect that went into the design is that different entities know
about environment variables that are essential for the build to work.

* Some variables are "obviously" relevant, like `PATH` or `TMPDIR`.
  However, there is no "obvious" value for them.

  * Both depend on the layout of the system in question. A special fast
    file system for temporary files might be provided at a designated
    location. Binaries might be installed under `/bin`, `/usr/bin`,
    `/usr/local/bin`, or even versioned paths to allow parallel installations
    of different versions of the same tool. For example, on Debian Gnu/Linux
    the `bash` is installed in `/bin`, whereas on FreeBSD it is usually
    installed in `/usr/local/bin` (but the prefix `/usr/local` is at the
    discretion of the system administrator).

  * The user might have custom-built versions of tools somewhere in the
    home directory, thus making the user the only one who knows an appropriate
    value for the `PATH` variable. Moreover, a user who works on several
    projects requiring different versions of the same tool may even require
    different values of the `PATH` variable for each project.

* The authors and users of a tool know about special variables the tools
  need to work. While the tool itself might serve a standard purpose, like
  compiling C code, the variables the tool depends on might be specific to
  that tool (like passing information about a license server).

* The maintainers of a porting or packaging system know about environment
  variables a tool might additionally need (e.g., in the homebrew case).
  These might not be needed if the same tool is packaged differently.

* The project authors know about environment variables special to their
  project that some of their actions need.

These different sources of information make it hard to designate a
single maintainer for the action environment. This makes approaches
undesirable that are based on a single source specifying the action
environment, like the `WORKSPACE` file, or the rule definitions. While
those approaches make it easy to predict the environment an action will
have, they all require the user to merge in the specifics of the system
and his or her personal settings for each checkout (including rebasing
these changes for each upstream change of that file). Collecting environment
variables via the rc-file mechanism allows setting each variable within
the appropriate scope (global, machine-dependent, user-spefic, project-specific,
specific to the user-project pair) in a conflict-free way by the entity
in charge of that scope.