summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2012-02-14 18:50:25 -0400
committerGravatar Joey Hess <joey@kitenet.net>2012-02-14 18:50:25 -0400
commitc26db2625934e315163077d19d814bc71df7ef6e (patch)
tree5963d3d59f28ceba36ac604638e23ee9b935c94c
parent7371209d13b595c427ae250ac22384d527127bbb (diff)
add scalability page
-rw-r--r--doc/index.mdwn1
-rw-r--r--doc/scalability.mdwn31
2 files changed, 32 insertions, 0 deletions
diff --git a/doc/index.mdwn b/doc/index.mdwn
index 8bbffab4a..9ba5d5c31 100644
--- a/doc/index.mdwn
+++ b/doc/index.mdwn
@@ -49,6 +49,7 @@ files with git.
* [[encryption]]
* [[bare_repositories]]
* [[internals]]
+* [[scalability]]
* [[design]]
* [[what git annex is not|not]]
* [[sitemap]]
diff --git a/doc/scalability.mdwn b/doc/scalability.mdwn
new file mode 100644
index 000000000..71e21ac4c
--- /dev/null
+++ b/doc/scalability.mdwn
@@ -0,0 +1,31 @@
+git-annex is designed for scalability. The key points are:
+
+* Arbitrarily large files can be managed. The only constraint
+ on file size are how large a file your filesystem can hold.
+
+ While git-annex does checksum files by default, there
+ is a [[WORM_backend|backends]] available that avoids the checksumming
+ overhead, so you can add new, enormous files, very fast. This also
+ allows it to be used on systems with very slow disk IO.
+
+* Memory usage should be constant. This is a "should", because there
+ can sometimes be leaks (and this is one of haskell's weak spots),
+ but git-annex is designed so that it does not need to hold all
+ the details about your repository in memory.
+
+ The one exception is that [[todo/git-annex_unused_eats_memory]],
+ because it *does* need to hold the whole repo state in memory. But
+ that is still considered a bug, and hoped to be solved one day.
+ Luckily, that command is not often used.
+
+* Many files can be managed. The limiting factor is git's own
+ limitations in scaling to repositories with a lot of files, and as git
+ improves this will improve. Scaling to hundreds of thousands of files
+ is not a problem, scaling beyond that and git will start to get slow.
+
+ To some degree, git-annex works around innefficiencies in git; for
+ example it batches input sent to certian git commands that are slow
+ when run in an emormous repository.
+
+* It can use as much, or as little bandwidth as is available. In
+ particular, any interrupted file transfer can be resumed by git-annex.