summaryrefslogtreecommitdiff
path: root/doc/bugs/git_annex_add_memory_leak.mdwn
blob: 57ce4c0f970e0f49ff95f36801ad1b2e72d3c7de (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
For the record, `git annex add` has had a series of memory leaks.
Mostly these are minor -- until you need to check in a few
million files in a single operation. 

If this happens to you, git-annex will run out of memory and stop.
(Generally well before your system runs out of memory, since it has some
built-in ulimits.) You can recover by just re-running the `git annex add`
-- it will automatically pick up where it left off.

A history of the leaks:

* Originally, `git annex add` remembered all the files
  it had added, and fed them to git at the end. Of course
  that made its memory use grow, so it was fixed to periodically
  flush its buffer. Affected versions: before 0.20110417

* Something called a "lazy state monad" caused "thunks" to build
  up and memory to leak. Also affected other git annex commands
  than `add`. Adding files using a SHA* backend hit the worst.
  Fixed in versions afer 3.20120123.

* A strange GHC bug seemed to be responsible for another leak.
  (In particular, a child process was forked. All the child did 
  was read filenames from one pipe and shove them reformatted out
  another pipe. For some reason, it steadily grew in size.)
  Code was rewritten in a way that happens to avoid that leak.
  Apparently fixed in versions afer 3.20120123, but this one is not
  well understood.

* (Note that `git ls-files --others`, which is used to find files to add,
  also uses surpsisingly large amounts
  of memory when you have a lot of files. It buffers
  the entire list, so it can compare it with the files in the index,
  before outputting anything.
  This is Not Our Problem, but I'm sure the git developers
  would appreciate a patch that fixes it.)