From c26db2625934e315163077d19d814bc71df7ef6e Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Tue, 14 Feb 2012 18:50:25 -0400 Subject: add scalability page --- doc/index.mdwn | 1 + doc/scalability.mdwn | 31 +++++++++++++++++++++++++++++++ 2 files changed, 32 insertions(+) create mode 100644 doc/scalability.mdwn (limited to 'doc') diff --git a/doc/index.mdwn b/doc/index.mdwn index 8bbffab4a..9ba5d5c31 100644 --- a/doc/index.mdwn +++ b/doc/index.mdwn @@ -49,6 +49,7 @@ files with git. * [[encryption]] * [[bare_repositories]] * [[internals]] +* [[scalability]] * [[design]] * [[what git annex is not|not]] * [[sitemap]] diff --git a/doc/scalability.mdwn b/doc/scalability.mdwn new file mode 100644 index 000000000..71e21ac4c --- /dev/null +++ b/doc/scalability.mdwn @@ -0,0 +1,31 @@ +git-annex is designed for scalability. The key points are: + +* Arbitrarily large files can be managed. The only constraint + on file size are how large a file your filesystem can hold. + + While git-annex does checksum files by default, there + is a [[WORM_backend|backends]] available that avoids the checksumming + overhead, so you can add new, enormous files, very fast. This also + allows it to be used on systems with very slow disk IO. + +* Memory usage should be constant. This is a "should", because there + can sometimes be leaks (and this is one of haskell's weak spots), + but git-annex is designed so that it does not need to hold all + the details about your repository in memory. + + The one exception is that [[todo/git-annex_unused_eats_memory]], + because it *does* need to hold the whole repo state in memory. But + that is still considered a bug, and hoped to be solved one day. + Luckily, that command is not often used. + +* Many files can be managed. The limiting factor is git's own + limitations in scaling to repositories with a lot of files, and as git + improves this will improve. Scaling to hundreds of thousands of files + is not a problem, scaling beyond that and git will start to get slow. + + To some degree, git-annex works around innefficiencies in git; for + example it batches input sent to certian git commands that are slow + when run in an emormous repository. + +* It can use as much, or as little bandwidth as is available. In + particular, any interrupted file transfer can be resumed by git-annex. -- cgit v1.2.3