---
layout: documentation
title: Optimizing Performance
---

# Optimizing Performance

<!-- [TOC] -->

Skylark efficiency often involves avoiding O(N^2) in time and/or space.
Crucially this involves understanding depsets and avoiding their expansion.

This can be hard to get right, so Bazel also provides a memory profiler that
assists you in finding spots where you might have made a mistake.

## Use depsets

Whenever you are rolling up information from rule dependencies you should use
[depsets](lib/depset.html). Only use plain lists or dicts to publish information
local to the current rule.

A depset represents information as a nested graph which enables sharing.

Consider the following graph:

```
C -> B -> A
D ---^
```

Each node publishes a single string. With depsets the data looks like this:

```
a = depset(direct=['a'])
b = depset(direct=['b'], transitive=[a])
c = depset(direct=['c'], transitive=[b])
d = depset(direct=['d'], transitive=[b])
```

Note that each item is only mentioned once. With lists you would get this:

```
a = ['a']
b = ['b', 'a']
c = ['c', 'b', 'a']
d = ['d', 'b', 'a']
```

Note that in this case `'a'` is mentioned four times! With larger graphs this
problem will only get worse.

Here is an example of a rule implementation that uses depsets correctly to
publish transitive information. Note that it is OK to publish rule-local
information using lists if you want since this is not O(N^2).

```
MyProvider = provider()

def _impl(ctx):
  my_things = ctx.attr.things
  all_things = depset(
      direct=my_things,
      transitive=[dep[MyProvider].all_things for dep in ctx.attr.deps]
  )
  ...
  return [MyProvider(
    my_things=my_things,  # OK, a flat list of rule-local things only
    all_things=all_things,  # OK, a depset containing dependencies
  )]
```

See the [depset overview](depsets.md) page for more information.

### Never call `depset#to_list`

You can coerce a depset to a flat list using [to_list](lib/depset.html#to_list).
This should be considered debugging functionality. Any flattening of a depset in
a rule implementation is almost always O(N^2).

A common misconception is that you can freely flatten at the very top level,
eg. at the `xx_binary` level. This is *still* O(N^2) when you build a set
of overlapping targets. This happens when building your tests
`//foo/tests/...`, or when importing an IDE project.

**Note**: Today it is possible to flatten depsets implicitly. Anywhere you
iterate a depset (explicitly or implicitly), or take its size, you are
effectively calling `to_list`. This functionality will soon be removed.

### Never call `len(depset)`

It is O(N) to get the number of items in a depset. It is however
O(1) to check if a depset is empty. This includes checking the truthiness
of a depset:

```
def _impl(ctx):
  args = ctx.actions.args()
  files = depset(...)

  # Bad, has to iterate over entire depset to get length
  if len(files) == 0:
    args.add("--files")
    args.add(files)

  # Good, O(1)
  if files:
    args.add("--files")
    args.add(files)
```

## Use `ctx.actions.args()` for command lines

When building command lines you should use [ctx.actions.args()](lib/Args.html).
This defers expansion of any depsets to the execution phase.

Apart from being strictly faster, this will reduce the memory consumption of
your rules -- sometimes by 90% or more.

Here are some tricks:

* Pass depsets and lists directly as arguments, instead of flattening them
yourself. They will get expanded by `ctx.actions.args()` for you.
If you need any transformations on the depset contents, look at
[ctx.actions.args#add](lib/Args.html#add) to see if anything fits the bill.

* Are you passing `File#path` as arguments? No need. Any
[File](lib/File.html) is automatically turned into its
[path](lib/File.html#path), deferred to expansion time.

* Avoid constructing strings by concatenating them together.
The best string argument is a constant as its memory will be shared between
all instances of your rule.

* If the args are too long for the command line an `ctx.actions.args()` object
can be conditionally or unconditionally written to a param file using
[`ctx.actions.args#use_param_file`](lib/Args.html#use_param_file). This is
done behind the scenes when the action is executed. If you need to explictly
control the params file you can write it manually using
[`ctx.actions.write`](lib/actions.html#write).

Example:

```
def _impl(ctx):
  ...
  args = ctx.actions.Args()
  file = ctx.declare_file(...)
  files = depset(...)

  # Bad, constructs a full string "--foo=<file path>" for each rule instance
  args.add("--foo=" + file.path)

  # Good, shares "-foo" among all rule instances, and defers file.path to later
  args.add("--foo")
  args.add(file)

  # Bad, makes a giant string of a whole depset
  args.add(" ".join(["-I%s" % file.short_path for file in files])

  # Good, only stores a reference to the depset
  args.add(files, format="-I%s", map_fn=_to_short_path)

# Function passed to map_fn above
def _to_short_path(files):
  return [file.short_path for file in files]
```

## Transitive action inputs should be depsets

When building an action using [ctx.actions.run](lib/actions.html?#run), do not
forget that the `inputs` field accepts a depset. Use this whenever inputs are
collected from dependencies transitively.

```
inputs = depset(...)
ctx.actions.run(
  inputs = inputs,  # Do *not* turn inputs into a list
  ...
)
```

## Performance profiling

To profile your code and analyze the performance, use the `--profile` flag:

```
$ bazel build --nobuild --profile=/tmp/prof //path/to:target
$ bazel analyze-profile /tmp/prof --html --html_details
```

Then, open the generated HTML file (`/tmp/prof.html` in the example).

## Memory Profiling

Bazel comes with a built-in memory profiler that can help you check your rule's
memory use. If there is a problem you can dump the Skylark heap to find the
exact line of code that is causing the problem.

### Enabling Memory Tracking

You must pass these two startup flags to *every* Bazel invocation:

  ```
  STARTUP_FLAGS=\
  --host_jvm_args=-javaagent:$(BAZEL)/third_party/allocation_instrumenter/java-allocation-instrumenter-3.0.1.jar \
  --host_jvm_args=-DRULE_MEMORY_TRACKER=1
  ```
  **NOTE**: The bazel repository comes with an allocation instrumenter.
  Make sure to adjust '$(BAZEL)' for your repository location.
-->

These start the server in memory tracking mode. If you forget these for even
one Bazel invocation the server will restart and you will have to start over.

### Using the Memory Tracker

Let's have a look at the target `foo` and see what it's up to. We add
`--nobuild` since it doesn't matter to memory consumption if we actually build
or not, we just have to run the analysis phase.

```
$ bazel $(STARTUP_FLAGS) build --nobuild //foo:foo
```

Let's see how much memory the whole Bazel instance consumes:

```
$ bazel $(STARTUP_FLAGS) info used-heap-size-after-gc
> 2594MB
```

Let's break it down by rule class by using `bazel dump --rules`:

```
$ bazel $(STARTUP_FLAGS) dump --rules
>

RULE                                 COUNT     ACTIONS          BYTES         EACH
genrule                             33,762      33,801    291,538,824        8,635
config_setting                      25,374           0     24,897,336          981
filegroup                           25,369      25,369     97,496,272        3,843
cc_library                           5,372      73,235    182,214,456       33,919
proto_library                        4,140     110,409    186,776,864       45,115
android_library                      2,621      36,921    218,504,848       83,366
java_library                         2,371      12,459     38,841,000       16,381
_gen_source                            719       2,157      9,195,312       12,789
_check_proto_library_deps              719         668      1,835,288        2,552
... (more output)
```

And finally let's have a look at where the memory is going by producing a
`pprof` file using `bazel dump --skylark_memory`:

```
$ bazel $(STARTUP_FLAGS) dump --skylark_memory=$HOME/prof.gz
> Dumping skylark heap to: /usr/local/google/home/$USER/prof.gz
```

Next, we use the `pprof` tool to investigate the heap. A good starting point is
getting a flame graph by using `pprof -flame $HOME/prof.gz`.

  You can get `pprof` from https://github.com/google/pprof.
-->

In this case we get a text dump of the hottest call sites annotated with lines:

```
$ pprof -text -lines $HOME/prof.gz
>
      flat  flat%   sum%        cum   cum%
  146.11MB 19.64% 19.64%   146.11MB 19.64%  android_library <native>:-1
  113.02MB 15.19% 34.83%   113.02MB 15.19%  genrule <native>:-1
   74.11MB  9.96% 44.80%    74.11MB  9.96%  glob <native>:-1
   55.98MB  7.53% 52.32%    55.98MB  7.53%  filegroup <native>:-1
   53.44MB  7.18% 59.51%    53.44MB  7.18%  sh_test <native>:-1
   26.55MB  3.57% 63.07%    26.55MB  3.57%  _generate_foo_files /foo/tc/tc.bzl:491
   26.01MB  3.50% 66.57%    26.01MB  3.50%  _build_foo_impl /foo/build_test.bzl:78
   22.01MB  2.96% 69.53%    22.01MB  2.96%  _build_foo_impl /foo/build_test.bzl:73
   ... (more output)
```