diff options
author | Joey Hess <joey@kitenet.net> | 2012-02-14 18:50:25 -0400 |
---|---|---|
committer | Joey Hess <joey@kitenet.net> | 2012-02-14 18:50:25 -0400 |
commit | c26db2625934e315163077d19d814bc71df7ef6e (patch) | |
tree | 5963d3d59f28ceba36ac604638e23ee9b935c94c | |
parent | 7371209d13b595c427ae250ac22384d527127bbb (diff) |
add scalability page
-rw-r--r-- | doc/index.mdwn | 1 | ||||
-rw-r--r-- | doc/scalability.mdwn | 31 |
2 files changed, 32 insertions, 0 deletions
diff --git a/doc/index.mdwn b/doc/index.mdwn index 8bbffab4a..9ba5d5c31 100644 --- a/doc/index.mdwn +++ b/doc/index.mdwn @@ -49,6 +49,7 @@ files with git. * [[encryption]] * [[bare_repositories]] * [[internals]] +* [[scalability]] * [[design]] * [[what git annex is not|not]] * [[sitemap]] diff --git a/doc/scalability.mdwn b/doc/scalability.mdwn new file mode 100644 index 000000000..71e21ac4c --- /dev/null +++ b/doc/scalability.mdwn @@ -0,0 +1,31 @@ +git-annex is designed for scalability. The key points are: + +* Arbitrarily large files can be managed. The only constraint + on file size are how large a file your filesystem can hold. + + While git-annex does checksum files by default, there + is a [[WORM_backend|backends]] available that avoids the checksumming + overhead, so you can add new, enormous files, very fast. This also + allows it to be used on systems with very slow disk IO. + +* Memory usage should be constant. This is a "should", because there + can sometimes be leaks (and this is one of haskell's weak spots), + but git-annex is designed so that it does not need to hold all + the details about your repository in memory. + + The one exception is that [[todo/git-annex_unused_eats_memory]], + because it *does* need to hold the whole repo state in memory. But + that is still considered a bug, and hoped to be solved one day. + Luckily, that command is not often used. + +* Many files can be managed. The limiting factor is git's own + limitations in scaling to repositories with a lot of files, and as git + improves this will improve. Scaling to hundreds of thousands of files + is not a problem, scaling beyond that and git will start to get slow. + + To some degree, git-annex works around innefficiencies in git; for + example it batches input sent to certian git commands that are slow + when run in an emormous repository. + +* It can use as much, or as little bandwidth as is available. In + particular, any interrupted file transfer can be resumed by git-annex. |