diff options
author | Joey Hess <joeyh@joeyh.name> | 2015-06-16 17:58:15 -0400 |
---|---|---|
committer | Joey Hess <joeyh@joeyh.name> | 2015-06-16 18:12:00 -0400 |
commit | 87ba1abc7cd1b199b0f7d778d9f27375b50de709 (patch) | |
tree | a8dcb4479872a1ddfd39053a7532e987c489f85e /doc/git-annex.mdwn | |
parent | a5ae3ecdb722219d3cdaee652450be1b96795f83 (diff) |
Increased the default annex.bloomaccuracy from 1000 to 10000000
This makes git annex unused use around 48 mb more memory than it did before,
but the massive increase in accuracy makes this worthwhile for all but the
smallest systems.
Also, I want to use the bloom filter for sync --all --content, to avoid
dropping files that the preferred content doesn't want, and 1/1000
false positives would be far too many in that use case, even if it were
acceptable for unused.
Actual memory use numbers:
1000: 21.06user 3.42system 0:26.40elapsed 92%CPU (0avgtext+0avgdata 501552maxresident)k
1000000: 21.41user 3.55system 0:26.84elapsed 93%CPU (0avgtext+0avgdata 549496maxresident)k
10000000: 21.84user 3.52system 0:27.89elapsed 90%CPU (0avgtext+0avgdata 549920maxresident)k
Based on these numbers, 10 million seemed a better pick than 1 million.
Diffstat (limited to 'doc/git-annex.mdwn')
-rw-r--r-- | doc/git-annex.mdwn | 20 |
1 files changed, 11 insertions, 9 deletions
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index e7c80f3cd..c90ef5ec2 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -830,20 +830,22 @@ Here are all the supported configuration settings. * `annex.bloomcapacity` - The `git annex unused` command uses a bloom filter to determine - what data is no longer used. The default bloom filter is sized to handle - up to 500000 keys. If your repository is larger than that, - you can adjust this to avoid `git annex unused` not noticing some unused - data files. Increasing this will make `git-annex unused` consume more memory; + The `git annex unused` and `git annex sync --content` commands use + a bloom filter to determine what files are present in eg, the work tree. + The default bloom filter is sized to handle + up to 500000 files. If your repository is larger than that, + you should increase this value. Larger values will + make `git-annex unused` and `git annex sync --content` consume more memory; run `git annex info` for memory usage numbers. * `annex.bloomaccuracy` Adjusts the accuracy of the bloom filter used by - `git annex unused`. The default accuracy is 1000 -- - 1 unused file out of 1000 will be missed by `git annex unused`. Increasing - the accuracy will make `git annex unused` consume more memory; - run `git annex info` for memory usage numbers. + `git annex unused` and `git annex sync --content`. + The default accuracy is 10000000 -- 1 unused file out of 10000000 + will be missed by `git annex unused`. Increasing the accuracy will make + `git annex unused` consume more memory; run `git annex info` + for memory usage numbers. * `annex.sshcaching` |