summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorGravatar http://joeyh.name/ <http://joeyh.name/@web>2014-07-04 19:26:00 +0000
committerGravatar admin <admin@branchable.com>2014-07-04 19:26:00 +0000
commitf9c1d5d84c510401c6b63f63dab871a549be9551 (patch)
treed2498209899bc573889218aad951893cd9a8d524 /doc
parentbcbc10461f832df209734952ec3d49f3c19aba6c (diff)
Added a comment
Diffstat (limited to 'doc')
-rw-r--r--doc/bugs/runs_of_of_memory_adding_2_million_files/comment_3_3cff88b50eb3872565bccbeb6ee15716._comment27
1 files changed, 27 insertions, 0 deletions
diff --git a/doc/bugs/runs_of_of_memory_adding_2_million_files/comment_3_3cff88b50eb3872565bccbeb6ee15716._comment b/doc/bugs/runs_of_of_memory_adding_2_million_files/comment_3_3cff88b50eb3872565bccbeb6ee15716._comment
new file mode 100644
index 000000000..7e2c5568d
--- /dev/null
+++ b/doc/bugs/runs_of_of_memory_adding_2_million_files/comment_3_3cff88b50eb3872565bccbeb6ee15716._comment
@@ -0,0 +1,27 @@
+[[!comment format=mdwn
+ username="http://joeyh.name/"
+ ip="209.250.56.55"
+ subject="comment 3"
+ date="2014-07-04T19:26:00Z"
+ content="""
+Looking at the code, it's pretty clear why this is using a lot of memory:
+
+<pre>
+ fs <- getJournalFiles jl
+ liftIO $ do
+ h <- hashObjectStart g
+ Git.UpdateIndex.streamUpdateIndex g
+ [genstream dir h fs]
+ hashObjectStop h
+ return $ liftIO $ mapM_ (removeFile . (dir </>)) fs
+</pre>
+
+So the big list in `fs` has to be retained in memory after the files are streamed to update-index, in order for them to be deleted!
+
+Fixing is a bit tricky.. New journal files can appear while this is going on, so it can't just run getJournalFiles a second time to get the files to clean.
+Maybe delete the file after it's been sent to git-update-index? But git-update-index is going to want to read the file, and we don't really know when it will choose to do so. It could wait a while after we've sent the filename to it, potentially.
+
+Also, per [[!commit 750c4ac6c282d14d19f79e0711f858367da145e4]], we cannot delete the journal files until *after* the commit, or another git-annex process would see inconsistent data!
+
+I actually think I am going to need to use a temp file to hold the list of files..
+"""]]