diff options
author | Joey Hess <joeyh@joeyh.name> | 2017-01-31 17:43:11 -0400 |
---|---|---|
committer | Joey Hess <joeyh@joeyh.name> | 2017-01-31 17:43:11 -0400 |
commit | 164466c987a7607a5f598b36e5b3111a68bd101f (patch) | |
tree | 592693e8cd910d8dc7438fa3137238cb00391c9a | |
parent | 893667e2565eb707ff004e2ea6f5411be3eea936 (diff) |
fix lack of laziness streaming large diffs
A commit last year that made a partial function use Maybe unfortunately
caused the whole input to need to be consumed, breaking streaming. So,
revert it.
This commit was sponsored by Nick Daly on Patreon.
-rw-r--r-- | Git/DiffTree.hs | 27 | ||||
-rw-r--r-- | doc/todo/more_efficient_memory_usage_with_git-annex_unused/comment_2_1eff4497bf0a3e87dc47a1226c5d4af8._comment | 19 |
2 files changed, 32 insertions, 14 deletions
diff --git a/Git/DiffTree.hs b/Git/DiffTree.hs index 645d18d35..309575aaf 100644 --- a/Git/DiffTree.hs +++ b/Git/DiffTree.hs @@ -89,7 +89,7 @@ commitDiff ref = getdiff (Param "show") getdiff :: CommandParam -> [CommandParam] -> Repo -> IO ([DiffTreeItem], IO Bool) getdiff command params repo = do (diff, cleanup) <- pipeNullSplit ps repo - return (fromMaybe (error $ "git " ++ show (toCommand ps) ++ " parse failed") (parseDiffRaw diff), cleanup) + return (parseDiffRaw diff, cleanup) where ps = command : @@ -100,24 +100,23 @@ getdiff command params repo = do params {- Parses --raw output used by diff-tree and git-log. -} -parseDiffRaw :: [String] -> Maybe [DiffTreeItem] +parseDiffRaw :: [String] -> [DiffTreeItem] parseDiffRaw l = go l [] where - go [] c = Just c - go (info:f:rest) c = case mk info f of - Nothing -> Nothing - Just i -> go rest (i:c) - go (_:[]) _ = Nothing + go [] c = c + go (info:f:rest) c = go rest (mk info f : c) + go (s:[]) _ = error $ "diff-tree parse error near \"" ++ s ++ "\"" mk info f = DiffTreeItem - <$> readmode srcm - <*> readmode dstm - <*> extractSha ssha - <*> extractSha dsha - <*> pure s - <*> pure (asTopFilePath $ fromInternalGitPath $ Git.Filename.decode f) + { srcmode = readmode srcm + , dstmode = readmode dstm + , srcsha = fromMaybe (error "bad srcsha") $ extractSha ssha + , dstsha = fromMaybe (error "bad dstsha") $ extractSha dsha + , status = s + , file = asTopFilePath $ fromInternalGitPath $ Git.Filename.decode f + } where - readmode = fst <$$> headMaybe . readOct + readmode = fst . Prelude.head . readOct -- info = :<srcmode> SP <dstmode> SP <srcsha> SP <dstsha> SP <status> -- All fields are fixed, so we can pull them out of diff --git a/doc/todo/more_efficient_memory_usage_with_git-annex_unused/comment_2_1eff4497bf0a3e87dc47a1226c5d4af8._comment b/doc/todo/more_efficient_memory_usage_with_git-annex_unused/comment_2_1eff4497bf0a3e87dc47a1226c5d4af8._comment new file mode 100644 index 000000000..df285478c --- /dev/null +++ b/doc/todo/more_efficient_memory_usage_with_git-annex_unused/comment_2_1eff4497bf0a3e87dc47a1226c5d4af8._comment @@ -0,0 +1,19 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2017-01-31T20:24:04Z" + content=""" +The heap profile has multiple spikes (so not an accumulating memory leak). +The diff parsing code is indeed what's using so much memory. Looks like +data is failing to stream through that code and instead the whole +diff output gets buffered. + +Aha.. Git.DiffTree.parseDiffRaw used to return a list, but changed +in [[!commit 8d124beba8]] +to a Maybe list in order to avoid being a partial function. But +that change destroyed laziness, since the whole input has to be parsed +in order to determine if Nothing should be returned. + +However, fixing that only eliminated part of the spike. There's something +else keeping data from streaming. +"""]] |