diff options
author | Joey Hess <joey@kitenet.net> | 2011-05-16 22:37:31 -0400 |
---|---|---|
committer | Joey Hess <joey@kitenet.net> | 2011-05-16 22:37:31 -0400 |
commit | 51cc71fac176878de2ccb960f62db419bb63d00f (patch) | |
tree | 030d89413001d64e8b531accea6dee2c6d319192 /doc/todo/cache_key_info.mdwn | |
parent | 21953a802a0f55399288b52834cbfa970fa40d0f (diff) |
longterm todo item
Diffstat (limited to 'doc/todo/cache_key_info.mdwn')
-rw-r--r-- | doc/todo/cache_key_info.mdwn | 36 |
1 files changed, 36 insertions, 0 deletions
diff --git a/doc/todo/cache_key_info.mdwn b/doc/todo/cache_key_info.mdwn new file mode 100644 index 000000000..d26d05512 --- /dev/null +++ b/doc/todo/cache_key_info.mdwn @@ -0,0 +1,36 @@ +Most of git-annex is designed to be fast no matter how many other files are +in the annex. Things like add/get/drop/move/fsck have good locality; +they will only operate on as many files as you need them to. + +(git commit can get a little slow with a great deal of files, +but that's out of scope -- and recent git-annex versions use queuing +to save git add from piling up too much in the index.) + +But currently two git-annex commands are quite slow when annexes become large +in quantity of files. These are unused and stats +(Both have --fast versions that don't do as much). + +Both are slow because both need two peices of information that are not +quick to look up, and require examining the whole repo, very seekily: + +1. The keys present in the annex. Found by looking thru .git/annex/objects. +2. The keys referenced by files in git. Found by finding every file + in git, and looking at its symlink. + +Of these, the first is less expensive (typically, an annex does not have every +key in it). It could be optimized fairly simply, by adding a database +of keys present in the annex that is optimised to list them all. The +database would be updated by the few functions that move content in and +out. + +The second is harder to optimise, because the user can delete, revert, +copy, add, etc files in git at will, and git-annex does not have a good way +to watch that and maintain a database of what keys are being referenced. + +It could use a post-commit hook and examine files changed by commits, etc. +But then staged files would be left out. It might be sufficient to +make --fast trust the database... except unused will suggest *deleting* +data if nothing references it. Or maybe it could be required to have a +clean tree with nothing staged before running git-annex unused. + +Anyway, this is a semi-longterm item for me. --[[Joey]] |