From 7ebd98d8d829005c7dae38b789146d98e6800e5b Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Tue, 14 Feb 2012 14:35:52 -0400 Subject: fix memory leak when staging the journal The list of files had to be retained until the end so it could be deleted. Also, a list of update-index lines was generated and only then fed into it. Now everything streams in constant space. --- doc/bugs/git_annex_add_memory_leak.mdwn | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) (limited to 'doc/bugs/git_annex_add_memory_leak.mdwn') diff --git a/doc/bugs/git_annex_add_memory_leak.mdwn b/doc/bugs/git_annex_add_memory_leak.mdwn index b6ae60f7b..891ba318f 100644 --- a/doc/bugs/git_annex_add_memory_leak.mdwn +++ b/doc/bugs/git_annex_add_memory_leak.mdwn @@ -12,26 +12,27 @@ A history of the leaks: * Originally, `git annex add` remembered all the files it had added, and fed them to git at the end. Of course that made its memory use grow, so it was fixed to periodically - flush its buffer. Affected versions: before 0.20110417 + flush its buffer. Fixed in version 0.20110417. * Something called a "lazy state monad" caused "thunks" to build up and memory to leak. Also affected other git annex commands than `add`. Adding files using a SHA* backend hit the worst. Fixed in versions afer 3.20120123. -* A strange GHC bug seemed to be responsible for another leak. - (In particular, a child process was forked. All the child did - was read filenames from one pipe and shove them reformatted out - another pipe. For some reason, it steadily grew in size.) - Code was rewritten in a way that happens to avoid that leak. - Apparently fixed in versions afer 3.20120123, but this one is not - well understood. - * Committing journal files turned out to have another memory leak. After adding a lot of files ran out of memory, this left the journal - behind and could affect other git-anne commands. Fixed in versions afer + behind and could affect other git-annex commands. Fixed in versions afer 3.20120123. +* Something is still causing a slow leak when adding files. + I tested by adding many copies of the whole linux kernel + tree into the annex using the WORM backend, and once + it had added 1 million files, git-annex used ~100 mb of ram. + That's 100 bytes leaked per file on average .. roughly the + size of a filename? It's worth noting that `git add` uses more memory + than that in such a large tree. + **not fixed yet** + * (Note that `git ls-files --others`, which is used to find files to add, also uses surpsisingly large amounts of memory when you have a lot of files. It buffers -- cgit v1.2.3