blob: 9c1da8eeaa99fc11992e9396804e069fcca99e2c (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
[[!comment format=mdwn
username="arand"
ip="130.243.226.21"
subject="comment 3"
date="2013-08-10T17:00:21Z"
content="""
So, if I've understood it correctly (please correct me if that's not the case :) )
Currently git-annex unused goes through this process
* Look through all files in the index and find those which are git-annex keys (git ls-tree + git cat-file)
* Look through all files the current ref and find those which are git-annex keys (git ls-tree + git cat-file)
* For each ref in the repo
- Look through all files and find those which are git-annex keys (git ls-tree + git cat-file)
* Then at the end
- Compare this list of keys with what is stored in .git/annex/objects
- Print out any objects which does not match a key.
If that's the case, it means if that if you have multiple refs, even is they only differ by single empty commits, git-annex will end up doing a cat-file for the same file multiple times (one per ref), which is expensive.
Would it be possible to change the algorithm for git-annex unused into instead something like:
* For the index, HEAD, and all refs
- Create a list all files and remove those which are duplicates based on their sha1 hash (git ls-tree | uniq)
* Then Look through this reduced list to find those which are git-annex keys (git cat-file)
* Then check as before
Unless this bypasses some safety or case I've overlooked, I think it should be possible to speed up git-annex unused quite a bit.
"""]]
|