summaryrefslogtreecommitdiff
path: root/doc/todo/Speed_up___39__import_--clean-duplicates__39__.mdwn
blob: 34c21ab01c6561314944282a9c4aaefaa2ee2f4d (plain)
1
2
3
4
5
6
7
I'm currently in the process of gutting old (some broken) git-annex's and cleaning out download directories from before I started using git-annex.

To do this, I am running `git annex import --clean--duplicates $PATH` on the directories I want to clear out but sometimes, this takes a unnecessarily long time.

For example, git-annex will calculate the digest for a huge file (30GB+) in $TARGET, even though there are no files in the annex of that size.

It's a common shortcut to check for duplicate sizes first to eliminate definite non-matches really quickly. Can this be added to git-annex's `import` in some way or is this a no-go due to the constant memory constraint?