diff options
author | Joey Hess <joey@kitenet.net> | 2011-11-06 15:18:45 -0400 |
---|---|---|
committer | Joey Hess <joey@kitenet.net> | 2011-11-06 15:22:40 -0400 |
commit | c99fb589097d96b4e10cd8b137f72881cdb93118 (patch) | |
tree | bf26fa39ad83a2df6558e76466137e6d209c2395 /Annex/Branch.hs | |
parent | bf07e2c921f66965ca0be816975fc608a81c4276 (diff) |
merge: Use fast-forward merges when possible.
Thanks Valentin Haenel for a test case showing how non-fast-forward merges
could result in an ongoing pull/merge/push cycle.
While the git-annex branch is fast-forwarded, git-annex's index file is still
updated using the union merge strategy as before. There's no other way to
update the index that would be any faster.
It is possible that a union merge and a fast-forward result in different file
contents: Files should have the same lines, but a union merge may change
their order. If this happens, the next commit made to the git-annex branch
will have some unnecessary changes to line orders, but the consistency
of data should be preserved.
Note that when the journal contains changes, a fast-forward is never attempted,
which is fine, because committing those changes would be vanishingly unlikely
to leave the git-annex branch at a commit that already exists in one of
the remotes.
The real difficulty is handling the case where multiple remotes have all
changed. git-annex does find the best (ie, newest) one and fast forwards
to it. If the remotes are diverged, no fast-forward is done at all. It would
be possible to pick one, fast forward to it, and make a merge commit to
the rest, I see no benefit to adding that complexity.
Determining the best of N changed remotes requires N*2+1 calls to git-log, but
these are fast git-log calls, and N is typically small. Also, typically
some or all of the remote refs will be the same, and git-log is not called to
compare those. In the real world I expect this will almost always add only
1 git-log call to the merge process. (Which already makes N anyway.)
Diffstat (limited to 'Annex/Branch.hs')
-rw-r--r-- | Annex/Branch.hs | 88 |
1 files changed, 65 insertions, 23 deletions
diff --git a/Annex/Branch.hs b/Annex/Branch.hs index 4c3192f53..0095b586b 100644 --- a/Annex/Branch.hs +++ b/Annex/Branch.hs @@ -117,28 +117,30 @@ commit message = whenM journalDirty $ lockJournal $ do g <- gitRepo withIndex $ liftIO $ Git.commit g message fullname [fullname] -{- Ensures that the branch is up-to-date; should be called before - - data is read from it. Runs only once per git-annex run. +{- Ensures that the branch is up-to-date; should be called before data is + - read from it. Runs only once per git-annex run. - - - Before refs are merged into the index, it's - - important to first stage the journal into the - - index. Otherwise, any changes in the journal - - would later get staged, and might overwrite - - changes made during the merge. + - Before refs are merged into the index, it's important to first stage the + - journal into the index. Otherwise, any changes in the journal would + - later get staged, and might overwrite changes made during the merge. - - - It would be cleaner to handle the merge by - - updating the journal, not the index, with changes - - from the branches. + - It would be cleaner to handle the merge by updating the journal, not the + - index, with changes from the branches. + - + - The index is always updated using a union merge, as that's the most + - efficient way to update it. However, if the branch can be + - fast-forwarded, that is then done, rather than adding an unnecessary + - commit to it. -} update :: Annex () update = onceonly $ do + g <- gitRepo -- check what needs updating before taking the lock dirty <- journalDirty - c <- filterM changedbranch =<< siblingBranches + c <- filterM (changedBranch name . snd) =<< siblingBranches let (refs, branches) = unzip c unless (not dirty && null refs) $ withIndex $ lockJournal $ do when dirty stageJournalFiles - g <- gitRepo unless (null branches) $ do showSideAction $ "merging " ++ (unwords $ map Git.refDescribe branches) ++ @@ -150,24 +152,64 @@ update = onceonly $ do - modify the branch. -} liftIO $ Git.UnionMerge.merge_index g branches - liftIO $ Git.commit g "update" fullname (nub $ fullname:refs) + ff <- if dirty then return False else tryFastForwardTo refs + unless ff $ + liftIO $ Git.commit g "update" fullname (nub $ fullname:refs) invalidateCache where - changedbranch (_, branch) = do - g <- gitRepo - -- checking with log to see if there have been changes - -- is less expensive than always merging - diffs <- liftIO $ Git.pipeRead g [ - Param "log", - Param (name ++ ".." ++ branch), - Params "--oneline -n1" - ] - return $ not $ L.null diffs onceonly a = unlessM (branchUpdated <$> getState) $ do r <- a disableUpdate return r +{- Checks if the second branch has any commits not present on the first + - branch. -} +changedBranch :: String -> String -> Annex Bool +changedBranch origbranch newbranch = do + g <- gitRepo + diffs <- liftIO $ Git.pipeRead g [ + Param "log", + Param (origbranch ++ ".." ++ newbranch), + Params "--oneline -n1" + ] + return $ not $ L.null diffs + +{- Given a set of refs that are all known to have commits not + - on the git-annex branch, tries to update the branch by a + - fast-forward. + - + - In order for that to be possible, one of the refs must contain + - every commit present in all the other refs, as well as in the + - git-annex branch. + -} +tryFastForwardTo :: [String] -> Annex Bool +tryFastForwardTo [] = return True +tryFastForwardTo (first:rest) = do + -- First, check that the git-annex branch does not contain any + -- new commits that are in the first other branch. If it does, + -- cannot fast-forward. + diverged <- changedBranch first fullname + if diverged + then no_ff + else maybe no_ff do_ff =<< findbest first rest + where + no_ff = return False + do_ff branch = do + g <- gitRepo + liftIO $ Git.run g "update-ref" [Param fullname, Param branch] + return True + findbest c [] = return $ Just c + findbest c (r:rs) + | c == r = findbest c rs + | otherwise = do + better <- changedBranch c r + worse <- changedBranch r c + case (better, worse) of + (True, True) -> return Nothing -- divergent fail + (True, False) -> findbest r rs -- better + (False, True) -> findbest c rs -- worse + (False, False) -> findbest c rs -- same + {- Avoids updating the branch. A useful optimisation when the branch - is known to have not changed, or git-annex won't be relying on info - from it. -} |