summaryrefslogtreecommitdiff
path: root/Utility
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2015-04-02 01:44:32 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2015-04-02 01:44:32 -0400
commit473a64e5e8b6fc3f3bd396c1756754ad057e3009 (patch)
treee1a774cefb7d900c4963c6e61adedc10b246de68 /Utility
parent8b17e33b84b0f60da422e2c32a5fa6f53c10df6d (diff)
Significantly sped up processing of large numbers of directories passed to a single git-annex command. (try 2)
New approach is to do it the expensive way for the first 100 paths on the command line, but then assume the user doesn't care about order too much and fall back to the cheap way that does not preserve order.
Diffstat (limited to 'Utility')
-rw-r--r--Utility/Path.hs19
1 files changed, 14 insertions, 5 deletions
diff --git a/Utility/Path.hs b/Utility/Path.hs
index 755436448..2675aa0f9 100644
--- a/Utility/Path.hs
+++ b/Utility/Path.hs
@@ -170,17 +170,26 @@ prop_relPathDirToFile_regressionTest = same_dir_shortcurcuits_at_difference
== joinPath ["..", "..", "..", "..", ".git", "annex", "objects", "18", "gk", "SHA256-foo", "SHA256-foo"]
{- Given an original list of paths, and an expanded list derived from it,
- - generates a list of lists, where each sublist corresponds to one of the
- - original paths. When the original path is a directory, any items
- - in the expanded list that are contained in that directory will appear in
- - its segment.
+ - which may be arbitrarily reordered, generates a list of lists, where
+ - each sublist corresponds to one of the original paths.
+ -
+ - When the original path is a directory, any items in the expanded list
+ - that are contained in that directory will appear in its segment.
+ -
+ - The order of the original list of paths is attempted to be preserved in
+ - the order of the returned segments. However, doing so has a O^NM
+ - growth factor. So, if the original list has more than 100 paths on it,
+ - we stop preserving ordering at that point. Presumably a user passing
+ - that many paths in doesn't care too much about order of the later ones.
-}
segmentPaths :: [FilePath] -> [FilePath] -> [[FilePath]]
segmentPaths [] new = [new]
segmentPaths [_] new = [new] -- optimisation
segmentPaths (l:ls) new = found : segmentPaths ls rest
where
- (found, rest)=partition (l `dirContains`) new
+ (found, rest) = if length ls < 100
+ then partition (l `dirContains`) new
+ else break (\p -> not (l `dirContains` p)) new
{- This assumes that it's cheaper to call segmentPaths on the result,
- than it would be to run the action separately with each path. In