diff options
Diffstat (limited to 'doc/bugs/added_branches_makes___39__git_annex_unused__39___slow/comment_3_12b20cbbc2b4cd1ab8af7e3eec9589b4._comment')
-rw-r--r-- | doc/bugs/added_branches_makes___39__git_annex_unused__39___slow/comment_3_12b20cbbc2b4cd1ab8af7e3eec9589b4._comment | 30 |
1 files changed, 30 insertions, 0 deletions
diff --git a/doc/bugs/added_branches_makes___39__git_annex_unused__39___slow/comment_3_12b20cbbc2b4cd1ab8af7e3eec9589b4._comment b/doc/bugs/added_branches_makes___39__git_annex_unused__39___slow/comment_3_12b20cbbc2b4cd1ab8af7e3eec9589b4._comment new file mode 100644 index 000000000..9c1da8eea --- /dev/null +++ b/doc/bugs/added_branches_makes___39__git_annex_unused__39___slow/comment_3_12b20cbbc2b4cd1ab8af7e3eec9589b4._comment @@ -0,0 +1,30 @@ +[[!comment format=mdwn + username="arand" + ip="130.243.226.21" + subject="comment 3" + date="2013-08-10T17:00:21Z" + content=""" +So, if I've understood it correctly (please correct me if that's not the case :) ) + +Currently git-annex unused goes through this process + +* Look through all files in the index and find those which are git-annex keys (git ls-tree + git cat-file) +* Look through all files the current ref and find those which are git-annex keys (git ls-tree + git cat-file) +* For each ref in the repo + - Look through all files and find those which are git-annex keys (git ls-tree + git cat-file) +* Then at the end + - Compare this list of keys with what is stored in .git/annex/objects + - Print out any objects which does not match a key. + +If that's the case, it means if that if you have multiple refs, even is they only differ by single empty commits, git-annex will end up doing a cat-file for the same file multiple times (one per ref), which is expensive. + +Would it be possible to change the algorithm for git-annex unused into instead something like: + +* For the index, HEAD, and all refs + - Create a list all files and remove those which are duplicates based on their sha1 hash (git ls-tree | uniq) +* Then Look through this reduced list to find those which are git-annex keys (git cat-file) +* Then check as before + +Unless this bypasses some safety or case I've overlooked, I think it should be possible to speed up git-annex unused quite a bit. + +"""]] |