doc/bugs/enormous_fsck_output_OOM/comment_9_8de694dff75e27856c8282d1f2d120b6._comment


1
2
3
4
5
6
7
8
9
10
11
12
13
14

[[!comment format=mdwn
 username="http://joeyh.name/"
 ip="108.236.230.124"
 subject="comment 9"
 date="2014-03-10T19:17:34Z"
 content="""
I've gotten this down to 900 mb used, in the case where every single line lists a different sha. Possibly more important, if lines repeat shas, or are extraneous, memory usage will be significantly lower. This might be enough to get it working in Aaron's repository, especially if the bulk of the git fsck output was about dangling objects, which are now ignored without buffering them all in memory.

The memory usage is just about as low as is possible; it takes a fair amount of memory just to hold 300 thousand shas in memory. And the git repair process needs to keep track of every broken sha. (Maybe there's a way to stream them, but I don't immediately see one.)

I hesitate to say this means the problem is truly fixed. I have some much larger repositories with eg, `git count-objects -v` reporting 2 million objects. If they all went corrupt, it would still use too much memory.

One improvement would be to store Shas in packed memory, rather than as strings like they are now. That would probably half the memory used. It still does not seem like a full solution.
"""]]