summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar http://joeyh.name/ <http://joeyh.name/@web>2013-02-26 20:29:08 +0000
committerGravatar admin <admin@branchable.com>2013-02-26 20:29:08 +0000
commitbae2fa498c3c72d13b72bd01494c76e50cba83e2 (patch)
treee5abd6464c6f19092b18327f2a2b2ac3d92d7f59
parent9bf904c7eeb3fb15dd9268975b907ed71a059d5f (diff)
Added a comment
-rw-r--r--doc/bugs/added_branches_makes___39__git_annex_unused__39___slow/comment_1_d350c39c67031c500e3224e92c0029ea._comment19
1 files changed, 19 insertions, 0 deletions
diff --git a/doc/bugs/added_branches_makes___39__git_annex_unused__39___slow/comment_1_d350c39c67031c500e3224e92c0029ea._comment b/doc/bugs/added_branches_makes___39__git_annex_unused__39___slow/comment_1_d350c39c67031c500e3224e92c0029ea._comment
new file mode 100644
index 000000000..092a7bbf2
--- /dev/null
+++ b/doc/bugs/added_branches_makes___39__git_annex_unused__39___slow/comment_1_d350c39c67031c500e3224e92c0029ea._comment
@@ -0,0 +1,19 @@
+[[!comment format=mdwn
+ username="http://joeyh.name/"
+ ip="4.152.108.210"
+ subject="comment 1"
+ date="2013-02-26T20:29:05Z"
+ content="""
+`git annex unused` finds content that is not used by the working tree or by *any* branch is unused. For the working tree, it can look at the symlinks on disk, which is the fastest option.
+
+For branches, it has to first use `git ls-tree` to find the files on the branch, and then use `git cat-file` to look up the key used by each file. It does this as fast as it can (eg, it runs a single `git cat-file --batch`, rather than one process per file). Still, this is pulling potentially a lot of data out of git, and it gets pretty slow.
+
+I've spent a lot of time optimising this as much as is possible with these constraints. One nice one is that, if it finds no unused keys after checking the working tree, it can stop, rather than checking any branches. Your example avoids this optimisation.
+
+Another optimisation is to only check each git ref once, even if multiple branches refer to it. You can see this optmisation firing in your transcript, when it only shows it's checking one branch of the two identical branches you've made.
+
+Indeed, if you go on and add 100 identical branches, you'll find it runs in just about the same time it ran with 2 branches. (There's a little overhead in getting the list of branches and throwing out the duplicates, but that's all.)
+
+What then explains your numbers? Well, I have no idea. I cannot replicate them; I tend to see about the same amount of time taken with two duplicate branches as with one branch. I suspect you just didn't get statistically valid results, which playing around with `sleep` at the command line often doesn't,
+due to caching, other active processes, etc.
+"""]]