diff options
author | Joey Hess <joeyh@joeyh.name> | 2015-03-04 20:45:07 -0400 |
---|---|---|
committer | Joey Hess <joeyh@joeyh.name> | 2015-03-04 20:45:07 -0400 |
commit | 3118bdbb3b3d9d5fd24aca13e158a5b7a58f3251 (patch) | |
tree | 0343182c9d0d240e0e79d5c63392318618a57f21 /doc/design | |
parent | 570580ad25720b381212dce422ece8384d4c54ea (diff) |
update
Diffstat (limited to 'doc/design')
-rw-r--r-- | doc/design/iabackup.mdwn | 14 |
1 files changed, 14 insertions, 0 deletions
diff --git a/doc/design/iabackup.mdwn b/doc/design/iabackup.mdwn index 88e8b813b..aa1012279 100644 --- a/doc/design/iabackup.mdwn +++ b/doc/design/iabackup.mdwn @@ -271,3 +271,17 @@ download some other Item. However, it might be good to rate limit the number of concurrent downloads of a given item, to prevent this and perhaps other issues. This could be done by a wrapper around git-annex shell or perhaps a git-annex modification. + +With clients all fscking their part of a shard once a month, +that will increase the size of the git repository, with new distributed +fsck updates. Basically, it grows by one line per file in the shard, +times the amount of redundancy that's been reached. So, a 10 thousand item +shard with redundancy 3 will grow by 30000 lines per month. Line length +for location log is 58 bytes, so that's 1.7 mb growth per month of the git +repo. (That's for blobs, plus additional overhead for trees and commits.) +However, git will delta compress most of it, so it might be +significantly smaller. If the distributed fsck timestamps are all +the same for a client, they will delta compress along with everything else. +This could reduce the blob growth to a few dozen bytes per client per month. +This is something to keep an eye on, especially since shipping large git +repo changes to clients is not desirable. |