aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar http://joeyh.name/ <joey@web>2013-04-05 02:31:08 +0000
committerGravatar admin <admin@branchable.com>2013-04-05 02:31:08 +0000
commit404b6b8d7153a32e2f5705b631d5c345817390e0 (patch)
tree96c206e21c5d0f3475cef8ceacb1a29ff49536cd
parent697a549126d881c49c2be81496988424807168a8 (diff)
Added a comment
-rw-r--r--doc/bugs/git-annex_add_should_repack_as_it_goes/comment_1_dbcaa0be4cd764128fb7263a95f73a32._comment22
1 files changed, 22 insertions, 0 deletions
diff --git a/doc/bugs/git-annex_add_should_repack_as_it_goes/comment_1_dbcaa0be4cd764128fb7263a95f73a32._comment b/doc/bugs/git-annex_add_should_repack_as_it_goes/comment_1_dbcaa0be4cd764128fb7263a95f73a32._comment
new file mode 100644
index 000000000..bdbeec108
--- /dev/null
+++ b/doc/bugs/git-annex_add_should_repack_as_it_goes/comment_1_dbcaa0be4cd764128fb7263a95f73a32._comment
@@ -0,0 +1,22 @@
+[[!comment format=mdwn
+ username="http://joeyh.name/"
+ nickname="joey"
+ subject="comment 1"
+ date="2013-04-05T02:31:08Z"
+ content="""
+I created 300,000 files and added them. I found the following main cost centers:
+
+* Checksumming all those files has overhead. Can be avoided by using `--backend=WORM`
+* git-annex runs `git add` after every 10k files it processes. As the index file grows in size, `git add` gets slower (git has to rewrite the index file each time; which really needs to be improved on the git side in order for git to scale better to lots of files). Can be avoided by setting `annex.queuesize` to a larger value. If you have enough memory, git-annex can buffer all 300,000 files and only run `git add` once.
+* At the end of `git annex add`, it has to stage location logs for all files. This takes a few minutes; it probably on a par with the overhead of running `git add` once with an equal number of files.
+
+After `git annex add`, I ran `git commit -m add`. This commit took only 11 minutes. (5 year old netbook with a SSD.) That may seem like a long time, but the (un-optimised) `git annex add` took 4 hours or so.
+
+In that commit, I saw these cost centers:
+
+* The `pre-commit` hooks runs `git annex pre-commit`, which scans all 300,000 symlinks to make sure they don't need fixing. That took around 5 seconds. Can be disabled if you don't mind manually running `git annex fix` when moving files.
+* Calculating the commit took a while.
+* `git commit` did process all the loose object files. It did not create a pack, and I am not sure why it needed to look at them, especially since it had already printed the sha of the commit, and so had already created all the objects it needed to at that point.
+
+Next I will try again, and first run `git gc` before `git commit` ....
+"""]]