diff options
Diffstat (limited to 'doc/scalability.mdwn')
-rw-r--r-- | doc/scalability.mdwn | 31 |
1 files changed, 31 insertions, 0 deletions
diff --git a/doc/scalability.mdwn b/doc/scalability.mdwn new file mode 100644 index 000000000..71e21ac4c --- /dev/null +++ b/doc/scalability.mdwn @@ -0,0 +1,31 @@ +git-annex is designed for scalability. The key points are: + +* Arbitrarily large files can be managed. The only constraint + on file size are how large a file your filesystem can hold. + + While git-annex does checksum files by default, there + is a [[WORM_backend|backends]] available that avoids the checksumming + overhead, so you can add new, enormous files, very fast. This also + allows it to be used on systems with very slow disk IO. + +* Memory usage should be constant. This is a "should", because there + can sometimes be leaks (and this is one of haskell's weak spots), + but git-annex is designed so that it does not need to hold all + the details about your repository in memory. + + The one exception is that [[todo/git-annex_unused_eats_memory]], + because it *does* need to hold the whole repo state in memory. But + that is still considered a bug, and hoped to be solved one day. + Luckily, that command is not often used. + +* Many files can be managed. The limiting factor is git's own + limitations in scaling to repositories with a lot of files, and as git + improves this will improve. Scaling to hundreds of thousands of files + is not a problem, scaling beyond that and git will start to get slow. + + To some degree, git-annex works around innefficiencies in git; for + example it batches input sent to certian git commands that are slow + when run in an emormous repository. + +* It can use as much, or as little bandwidth as is available. In + particular, any interrupted file transfer can be resumed by git-annex. |