diff options
author | Joey Hess <joey@kitenet.net> | 2012-06-27 16:14:50 -0400 |
---|---|---|
committer | Joey Hess <joey@kitenet.net> | 2012-06-27 16:14:50 -0400 |
commit | e365b8300d8d301dfec358bf07496d1eeecad300 (patch) | |
tree | b6b27385abe78cdb9df77d26e681e1a23273e920 | |
parent | 6677a99cb42e40baedfc98b1602171ec0c14f86b (diff) |
blog for the day
-rw-r--r-- | doc/design/assistant/blog/day_18__merging.mdwn | 82 |
1 files changed, 82 insertions, 0 deletions
diff --git a/doc/design/assistant/blog/day_18__merging.mdwn b/doc/design/assistant/blog/day_18__merging.mdwn new file mode 100644 index 000000000..43b8f4958 --- /dev/null +++ b/doc/design/assistant/blog/day_18__merging.mdwn @@ -0,0 +1,82 @@ +Worked on automatic merge conflict resolution today. I had expected to be +able to use git's merge driver interface for this, but that interface is +not sufficient. There are two problems with it: + +1. The merge program is run when git is in the middle of an operation + that locks the index. So it cannot delete or stage files. I need to + do both as part of my conflict resolution strategy. +2. The merge program is not run at all when the merge conflict is caused + by one side deleting a file, and the other side modifying it. This is + an important case to handle. + +So, instead, git-annex will use a regular `git merge`, and if it fails, it +will fix up the conflicts. + +That presented its own difficully, of finding which files in the tree +conflict. `git ls-files --unmerged` is the way to do that, but its output +is a quite raw form: + + 120000 3594e94c04db171e2767224db355f514b13715c5 1 foo + 120000 35ec3b9d7586b46c0fd3450ba21e30ef666cfcd6 3 foo + 100644 1eabec834c255a127e2e835dadc2d7733742ed9a 2 bar + 100644 36902d4d842a114e8b8912c02d239b2d7059c02b 3 bar + +I had to stare at the rather inpenetrable documentation for hours and +write a lot of parsing and processing code to get from that to these mostly +self expanatory data types: + + data Conflicting v = Conflicting + { valUs :: Maybe v + , valThem :: Maybe v + } deriving (Show) + + data Unmerged = Unmerged + { unmergedFile :: FilePath + , unmergedBlobType :: Conflicting BlobType + , unmergedSha :: Conflicting Sha + } deriving (Show) + +Not the first time I've whined here about time spent parsing unix command +output, is it? :) + +From there, it was relatively easy to write the actual conflict cleanup +code, and make `git annex sync` use it. Here's how it looks: + + $ ls -1 + foo.png + bar.png + $ git annex sync + commit + # On branch master + nothing to commit (working directory clean) + ok + merge synced/master + CONFLICT (modify/delete): passwd deleted in refs/heads/synced/master and modified in HEAD. Version HEAD of passwd left in tree. + Automatic merge failed; fix conflicts and then commit the result. + passwd: needs merge + (Recording state in git...) + [master 0354a67] git-annex automatic merge conflict fix + ok + $ ls -1 + foo.png + bar.variant-a1fe.png + bar.variant-93a1.png + +There are very few options for ways for the conflict resolution code to +name conflicting variants of files. The conflict resolver can only use data +present in git to generate the names, because the same conflict needs to +be resolved the same everywhere. + +So I had to choose between using the full key name in the filenames produced +when resolving a merge, and using a shorter checksum of the key, that would be +more user-friendly, but could theoretically collide with another key. +I chose the checksum, and weakened it horribly by only using 32 bits of it! + +Surprisingly, I think this is a safe choice. The worst that can +happens if such a collision happens is another conflict, and the conflict +resolution code will work on conflicts produced by the conflict resolution +code! In such a case, it does fall back to putting the whole key in +the filename: +"bar.variant-SHA256-s2550--2c09deac21fa93607be0844fefa870b2878a304a7714684c4cc8f800fda5e16b.png" + +Still need to hook this code into `git annex assistant`. |