diff options
Diffstat (limited to 'doc/todo/branching.mdwn')
-rw-r--r-- | doc/todo/branching.mdwn | 159 |
1 files changed, 0 insertions, 159 deletions
diff --git a/doc/todo/branching.mdwn b/doc/todo/branching.mdwn deleted file mode 100644 index f65849584..000000000 --- a/doc/todo/branching.mdwn +++ /dev/null @@ -1,159 +0,0 @@ -[[done]] !!! - -The use of `.git-annex` to store logs means that if a repo has branches -and the user switched between them, git-annex will see different logs in -the different branches, and so may miss info about what remotes have which -files (though it can re-learn). - -An alternative would be to store the log data directly in the git repo -as `pristine-tar` does. Problem with that approach is that git won't merge -conflicting changes to log files if they are not in the currently checked -out branch. - -It would be possible to use a branch with a tree like this, to avoid -conflicts: - -key/uuid/time/status - -As long as new files are only added, and old timestamped files deleted, -there would be no conflicts. - -A related problem though is the size of the tree objects git needs to -commit. Having the logs in a separate branch doesn't help with that. -As more keys are added, the tree object size will increase, and git will -take longer and longer to commit, and use more space. One way to deal with -this is simply by splitting the logs among subdirectories. Git then can -reuse trees for most directories. (Check: Does it still have to build -dup trees in memory?) - -Another approach would be to have git-annex *delete* old logs. Keep logs -for the currently available files, or something like that. If other log -info is needed, look back through history to find the first occurance of a -log. Maybe even look at other branches -- so if the logs were on master, -a new empty branch could be made and git-annex would still know where to -get keys in that branch. - -Would have to be careful about conflicts when deleting and bringing back -files with the same name. And would need to avoid expensive searching thru -all history to try to find an old log file. - -## fleshed out proposal - -Let's use one branch per uuid, named git-annex/$UUID. - -- I came to realize this would be a good idea when thinking about how - to upgrade. Each individual annex will be upgraded independantly, - so each will want to make a branch, and if the branches aren't distinct, - they will merge conflict for sure. -- TODO: What will need to be done to git to make it push/pull these new - branches? -- A given repo only ever writes to its UUID branch. So no conflicts. - - **problem**: git annex move needs to update log info for other repos! - (possibly solvable by having git-annex-shell update the log info - when content is moved using it) -- (BTW, UUIDs probably don't compress well, and this reduces the bloat of having - them repeated lots of times in the tree.) -- Per UUID branches mean that if it wants to find a file's location - among configured remotes, it can examine only their branches, if - desired. -- It's important that the per-repo branches propigate beyond immediate - remotes. If there is a central bare repo, that means push --all. Without - one, it means that when repo B pulls from A, and then C pulls from B, - C needs to get A's branch -- which means that B should have a tracking - branch for A's branch. - -In the branch, only one file is needed. Call it locationlog. git-annex -can cache location log changes and write them all to locationlog in -a single git operation on shutdown. - -- TODO: what if it's ctrl-c'd with changes pending? Perhaps it should - collect them to .git/annex/locationlog, and inject that file on shutdown? -- This will be less overhead than the current staging of all the log files. - -The log is not appended to, so in git we have a series of commits each of -which replaces the log's entire contens. - -To find locations of a key, all (or all relevant) branches need to be -examined, looking backward through the history of each until a log -with a indication of the presense/absense of the key is found. - -- This will be less expensive for files that have recently been added - or transfered. -- It could get pretty slow when digging deeper. -- Only 3 places in git-annex will be affected by any slowdown: move --from, - get and drop. (Update: Now also unused, whereis, fsck) - -## alternate - -As above, but use a single git-annex branch, and keep the per-UUID -info in their own log files. Hope that git can auto-merge as long as -each observing repo only writes to its own files. (Well, it can, but for -non-fast-forward merges, the git-annex branch would need to be checked out, -which is problimatic.) - -Use filenames like: - - <observing uuid>/<location uuid> - -That allows one repo to record another's state when doing a -`move`. - -## outside the box approach - -If the problem is limited to only that the `.git-annex/` files make -branching difficult (and not to the related problem that commits to them -and having them in the tree are sorta annoying), then a simple approach -would be to have git-annex look in other branches for location log info -too. - -The problem would then be that any locationlog lookup would need to look in -all other branches (any branch could have more current info after all), -which could get expensive. - -## way outside the box approach - -Another approach I have been mulling over is keeping the log file -branch checked out in .git/annex/logs/ -- this would be a checkout of a git -repository inside a git repository, using "git fake bare" techniques. This -would solve the merge problem, since git auto merge could be used. It would -still mean all the log files are on-disk, which annoys some. It would -require some tighter integration with git, so that after a pull, the log -repo is updated with the data pulled. --[[Joey]] - -> Seems I can't use git fake bare exactly. Instead, the best option -> seems to be `git clone --shared` to make a clone that uses -> `.git/annex/logs/.git` to hold its index etc, but (mostly) uses -> objects from the main repo. There would be some bloat, -> as commits to the logs made in there would not be shared with the main -> repo. Using `GIT_OBJECT_DIRECTORY` might be a way to avoid that bloat. - -## notes - -Another approach could be to use git-notes. It supports merging branches -of notes, with union merge strategy (a hook would have to do this after -a pull, it's not done automatically). - -Problem: Notes are usually attached to git -objects, and there are no git objects corresponding to git-annex keys. - -Problem: Notes are not normally copied when cloning. - ------- - -## elminating the merge problem - -Most of the above options are complicated by the problem of how to merge -changes from remotes. It should be possible to deal with the merge -problem generically. Something like this: - -* We have a local branch `B`. -* For remotes, there are also `origin/B`, `otherremote/B`, etc. -* To merge two branches `B` and `foo/B`, construct a merge commit that - makes each file have all lines that were in either version of the file, - with duplicates removed (probably). Do this without checking out a tree. - -- now implemented as git-union-merge -* As a `post-merge` hook, merge `*/B` into `B`. This will ensure `B` - is always up-to-date after a pull from a remote. -* When pushing to a remote, nothing need to be done, except ensure - `B` is either successfully pushed, or the push fails (and a pull needs to - be done to get the remote's changes merged into `B`). |