diff options
Diffstat (limited to 'doc/todo/Speed_up___39__import_--clean-duplicates__39__/comment_1_9268c639d3d21cce4ca7b60d08e9cb65._comment')
-rw-r--r-- | doc/todo/Speed_up___39__import_--clean-duplicates__39__/comment_1_9268c639d3d21cce4ca7b60d08e9cb65._comment | 10 |
1 files changed, 10 insertions, 0 deletions
diff --git a/doc/todo/Speed_up___39__import_--clean-duplicates__39__/comment_1_9268c639d3d21cce4ca7b60d08e9cb65._comment b/doc/todo/Speed_up___39__import_--clean-duplicates__39__/comment_1_9268c639d3d21cce4ca7b60d08e9cb65._comment new file mode 100644 index 000000000..8584d5ae8 --- /dev/null +++ b/doc/todo/Speed_up___39__import_--clean-duplicates__39__/comment_1_9268c639d3d21cce4ca7b60d08e9cb65._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="24.159.78.125" + subject="interesting idea" + date="2014-07-30T15:03:46Z" + content=""" +This could be done in constant space using a bloom filter of known file sizes. Files with wrong sizes would sometimes match, but no problem, it would then just do the work it does now. + +However, to build such a filter, git-annex would need to do a scan of all keys it knows about. This would take approximately as long to run as `git annex unused` does. It might make sense to only build the filter if it runs into a fairly large file. Alternatively, a bloom filter of file sizes could be cached and updated on the fly as things change (but this gets pretty complex). +"""]] |