aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2011-11-06 15:18:45 -0400
committerGravatar Joey Hess <joey@kitenet.net>2011-11-06 15:22:40 -0400
commitc99fb589097d96b4e10cd8b137f72881cdb93118 (patch)
treebf26fa39ad83a2df6558e76466137e6d209c2395
parentbf07e2c921f66965ca0be816975fc608a81c4276 (diff)
merge: Use fast-forward merges when possible.
Thanks Valentin Haenel for a test case showing how non-fast-forward merges could result in an ongoing pull/merge/push cycle. While the git-annex branch is fast-forwarded, git-annex's index file is still updated using the union merge strategy as before. There's no other way to update the index that would be any faster. It is possible that a union merge and a fast-forward result in different file contents: Files should have the same lines, but a union merge may change their order. If this happens, the next commit made to the git-annex branch will have some unnecessary changes to line orders, but the consistency of data should be preserved. Note that when the journal contains changes, a fast-forward is never attempted, which is fine, because committing those changes would be vanishingly unlikely to leave the git-annex branch at a commit that already exists in one of the remotes. The real difficulty is handling the case where multiple remotes have all changed. git-annex does find the best (ie, newest) one and fast forwards to it. If the remotes are diverged, no fast-forward is done at all. It would be possible to pick one, fast forward to it, and make a merge commit to the rest, I see no benefit to adding that complexity. Determining the best of N changed remotes requires N*2+1 calls to git-log, but these are fast git-log calls, and N is typically small. Also, typically some or all of the remote refs will be the same, and git-log is not called to compare those. In the real world I expect this will almost always add only 1 git-log call to the merge process. (Which already makes N anyway.)
-rw-r--r--Annex/Branch.hs88
-rw-r--r--debian/changelog8
-rw-r--r--doc/bugs/making_annex-merge_try_a_fast-forward.mdwn6
3 files changed, 79 insertions, 23 deletions
diff --git a/Annex/Branch.hs b/Annex/Branch.hs
index 4c3192f53..0095b586b 100644
--- a/Annex/Branch.hs
+++ b/Annex/Branch.hs
@@ -117,28 +117,30 @@ commit message = whenM journalDirty $ lockJournal $ do
g <- gitRepo
withIndex $ liftIO $ Git.commit g message fullname [fullname]
-{- Ensures that the branch is up-to-date; should be called before
- - data is read from it. Runs only once per git-annex run.
+{- Ensures that the branch is up-to-date; should be called before data is
+ - read from it. Runs only once per git-annex run.
-
- - Before refs are merged into the index, it's
- - important to first stage the journal into the
- - index. Otherwise, any changes in the journal
- - would later get staged, and might overwrite
- - changes made during the merge.
+ - Before refs are merged into the index, it's important to first stage the
+ - journal into the index. Otherwise, any changes in the journal would
+ - later get staged, and might overwrite changes made during the merge.
-
- - It would be cleaner to handle the merge by
- - updating the journal, not the index, with changes
- - from the branches.
+ - It would be cleaner to handle the merge by updating the journal, not the
+ - index, with changes from the branches.
+ -
+ - The index is always updated using a union merge, as that's the most
+ - efficient way to update it. However, if the branch can be
+ - fast-forwarded, that is then done, rather than adding an unnecessary
+ - commit to it.
-}
update :: Annex ()
update = onceonly $ do
+ g <- gitRepo
-- check what needs updating before taking the lock
dirty <- journalDirty
- c <- filterM changedbranch =<< siblingBranches
+ c <- filterM (changedBranch name . snd) =<< siblingBranches
let (refs, branches) = unzip c
unless (not dirty && null refs) $ withIndex $ lockJournal $ do
when dirty stageJournalFiles
- g <- gitRepo
unless (null branches) $ do
showSideAction $ "merging " ++
(unwords $ map Git.refDescribe branches) ++
@@ -150,24 +152,64 @@ update = onceonly $ do
- modify the branch.
-}
liftIO $ Git.UnionMerge.merge_index g branches
- liftIO $ Git.commit g "update" fullname (nub $ fullname:refs)
+ ff <- if dirty then return False else tryFastForwardTo refs
+ unless ff $
+ liftIO $ Git.commit g "update" fullname (nub $ fullname:refs)
invalidateCache
where
- changedbranch (_, branch) = do
- g <- gitRepo
- -- checking with log to see if there have been changes
- -- is less expensive than always merging
- diffs <- liftIO $ Git.pipeRead g [
- Param "log",
- Param (name ++ ".." ++ branch),
- Params "--oneline -n1"
- ]
- return $ not $ L.null diffs
onceonly a = unlessM (branchUpdated <$> getState) $ do
r <- a
disableUpdate
return r
+{- Checks if the second branch has any commits not present on the first
+ - branch. -}
+changedBranch :: String -> String -> Annex Bool
+changedBranch origbranch newbranch = do
+ g <- gitRepo
+ diffs <- liftIO $ Git.pipeRead g [
+ Param "log",
+ Param (origbranch ++ ".." ++ newbranch),
+ Params "--oneline -n1"
+ ]
+ return $ not $ L.null diffs
+
+{- Given a set of refs that are all known to have commits not
+ - on the git-annex branch, tries to update the branch by a
+ - fast-forward.
+ -
+ - In order for that to be possible, one of the refs must contain
+ - every commit present in all the other refs, as well as in the
+ - git-annex branch.
+ -}
+tryFastForwardTo :: [String] -> Annex Bool
+tryFastForwardTo [] = return True
+tryFastForwardTo (first:rest) = do
+ -- First, check that the git-annex branch does not contain any
+ -- new commits that are in the first other branch. If it does,
+ -- cannot fast-forward.
+ diverged <- changedBranch first fullname
+ if diverged
+ then no_ff
+ else maybe no_ff do_ff =<< findbest first rest
+ where
+ no_ff = return False
+ do_ff branch = do
+ g <- gitRepo
+ liftIO $ Git.run g "update-ref" [Param fullname, Param branch]
+ return True
+ findbest c [] = return $ Just c
+ findbest c (r:rs)
+ | c == r = findbest c rs
+ | otherwise = do
+ better <- changedBranch c r
+ worse <- changedBranch r c
+ case (better, worse) of
+ (True, True) -> return Nothing -- divergent fail
+ (True, False) -> findbest r rs -- better
+ (False, True) -> findbest c rs -- worse
+ (False, False) -> findbest c rs -- same
+
{- Avoids updating the branch. A useful optimisation when the branch
- is known to have not changed, or git-annex won't be relying on info
- from it. -}
diff --git a/debian/changelog b/debian/changelog
index fdd3cb7f6..57be53c85 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,11 @@
+git-annex (3.20111106) UNRELEASED; urgency=low
+
+ * merge: Use fast-forward merges when possible.
+ Thanks Valentin Haenel for a test case showing how non-fast-forward
+ merges could result in an ongoing pull/merge/push cycle.
+
+ -- Joey Hess <joeyh@debian.org> Sun, 06 Nov 2011 14:57:57 -0400
+
git-annex (3.20111105) unstable; urgency=low
* The default backend used when adding files to the annex is changed
diff --git a/doc/bugs/making_annex-merge_try_a_fast-forward.mdwn b/doc/bugs/making_annex-merge_try_a_fast-forward.mdwn
index a2bd8c747..41a5a2a58 100644
--- a/doc/bugs/making_annex-merge_try_a_fast-forward.mdwn
+++ b/doc/bugs/making_annex-merge_try_a_fast-forward.mdwn
@@ -27,3 +27,9 @@ But as sometimes annex-merge takes time, it would probably be worth it
>
> Although, perhaps fast-forward merge would use slightly
> less space. --[[Joey]]
+
+>> To avoid the ladder-merge between two repositories described at
+>> <http://sprunge.us/LOMU>, seems a fast-forward should be detected and
+>> written to git, even if the index is still updated the current way.
+>> [[done]]
+>> --[[Joey]]