summaryrefslogtreecommitdiff
path: root/doc/todo/git-annex_unused_eats_memory.mdwn
diff options
context:
space:
mode:
Diffstat (limited to 'doc/todo/git-annex_unused_eats_memory.mdwn')
-rw-r--r--doc/todo/git-annex_unused_eats_memory.mdwn32
1 files changed, 0 insertions, 32 deletions
diff --git a/doc/todo/git-annex_unused_eats_memory.mdwn b/doc/todo/git-annex_unused_eats_memory.mdwn
deleted file mode 100644
index 760a6ccf5..000000000
--- a/doc/todo/git-annex_unused_eats_memory.mdwn
+++ /dev/null
@@ -1,32 +0,0 @@
-`git-annex unused` has to compare large sets of data
-(all keys with content present in the repository,
-with all keys used by files in the repository), and so
-uses more memory than git-annex typically needs.
-
-It used to be a lot worse (hundreds of megabytes).
-
-Now it only needs enough memory to store a Set of all Keys that currently
-have content in the annex. On a lightly populated repository, it runs in
-quite low memory use (like 8 mb) even if the git repo has 100 thousand
-files. On a repository with lots of file contents, it will use more.
-
-Still, I would like to reduce this to a purely constant memory use,
-as running in constant memory no matter the repo size is a git-annex design
-goal.
-
-One idea is to use a bloom filter.
-For example, construct a bloom filter of all keys used by files in
-the repository. Then for each key with content present, check if it's
-in the bloom filter. Since there can be false positives, this might
-miss finding some unused keys. The probability/size of filter
-could be tunable.
-
-> Fixed in `bloom` branch in git. --[[Joey]]
->> [[done]]! --[[Joey]]
-
-Another way might be to scan the git log for files that got removed
-or changed what key they pointed to. Correlate with keys with content
-currently present in the repository (possibly using a bloom filter again),
-and that would yield a shortlist of keys that are probably not used.
-Then scan thru all files in the repo to make sure that none point to keys
-on the shortlist.