From 939a6f860e1a2eea58e46a05861076e1b174cbd2 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 17 Oct 2010 23:53:01 -0400 Subject: thoughts --- doc/git-annex.mdwn | 35 +++++++++++++++++++++++++++++++++-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index d15ca4a9f..4c85a03b6 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -197,8 +197,39 @@ may not be dropped right away, depending on number of copies available. The use of `.git-annex` to store logs means that if a repo has branches and the user switched between them, git-annex will see different logs in the different branches, and so may miss info about what remotes have which -files (though it can re-learn). An alternative would be to -store the log data directly in the git repo as `pristine-tar` does. +files (though it can re-learn). + +An alternative would be to store the log data directly in the git repo +as `pristine-tar` does. Problem with that approach is that git won't merge +conflicting changes to log files if they are not in the currently checked +out branch. + +It would be possible to use a branch with a tree like this, to avoid +conflicts: + +key/uuid/time/status + +As long as new files are only added, and old timestamped files deleted, +there would be no conflicts. + +A related problem though is the size of the tree objects git needs to +commit. Having the logs in a separate branch doesn't help with that. +As more keys are added, the tree object size will increase, and git will +take longer and longer to commit, and use more space. One way to deal with +this is simply by splitting the logs amoung subdirectories. Git then can +reuse trees for most directories. (Check: Does it still have to build +dup trees in memory?) + +Another approach would be to have git-annex *delete* old logs. Keep logs +for the currently available files, or something like that. If other log +info is needed, look back through history to find the first occurance of a +log. Maybe even look at other branches -- so if the logs were on master, +a new empty branch could be made and git-annex would still know where to +get keys in that branch. + +Would have to be careful about conflicts when deleting and bringing back +files with the same name. And would need to avoid expensive searching thru +all history to try to find an old log file. ## contact -- cgit v1.2.3