From 69c14d130bc7a754e3a4fa184ff317690ad48ca6 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 4 Mar 2011 12:31:01 -0400 Subject: update --- doc/distributed_version_control.mdwn | 13 +++++++++++++ doc/future_proofing.mdwn | 24 ++++++++++++++++++++++++ doc/not.mdwn | 8 ++++++++ doc/repomap.png | Bin 0 -> 129065 bytes doc/transferring_data.mdwn | 14 ++++++++++++++ doc/use_case/Alice.mdwn | 6 ++++-- doc/use_case/Bob.mdwn | 7 +++++++ 7 files changed, 70 insertions(+), 2 deletions(-) create mode 100644 doc/distributed_version_control.mdwn create mode 100644 doc/future_proofing.mdwn create mode 100644 doc/repomap.png create mode 100644 doc/transferring_data.mdwn diff --git a/doc/distributed_version_control.mdwn b/doc/distributed_version_control.mdwn new file mode 100644 index 000000000..f9cdb7e99 --- /dev/null +++ b/doc/distributed_version_control.mdwn @@ -0,0 +1,13 @@ +In git, there can be multiple clones of a repository, each clone can +be independently modified, and clones can push or pull changes to +one-another to get back in sync. + +git-annex preserves that fundamental distributed nature of git, while +dropping the requirement that, once in sync, each clone contains all the data +that was committed to each other clone. Instead of storing the content +of a file in the repository, git-annex stores a pointer to the content. + +Each git-annex repository is responsible for storing some of the content, +and can copy it to or from other repositories. [[Location_tracking]] +information is committed to git, to let repositories inform other +repositories what file contents they have available. diff --git a/doc/future_proofing.mdwn b/doc/future_proofing.mdwn new file mode 100644 index 000000000..4d4939b5a --- /dev/null +++ b/doc/future_proofing.mdwn @@ -0,0 +1,24 @@ +Imagine putting a git-annex drive in a time capsule. In 20, or 50, or 100 +years, you'd like its contents to be as accessible as possible to whoever +digs it up. + +This is a hard problem. git-annex cannot completly solve it, but it does +its best to not contribute to the problem. Here are some aspects of the +problem: + +* How are files accessed? Git-annex carefully adds minimal complexity + to access files in a repository. Nothing needs to be done to extract + files from the repository; they are there on disk in the usual way, + with just some symlinks pointing at the annexed file contents. + Neither git-annex nor git is needed to get at the file contents. + +* What file formats are used? Will they still be readable? To deal with + this, it's best to stick to plain text files, and the most common + image, sound, etc formats. Consider storing the same content in multiple + formats. + +* What filesystem is used on the drive? Will that filesystem still be + available? + +* What is the hardware interface of the drive? Will hardware still exist + to talk to it? diff --git a/doc/not.mdwn b/doc/not.mdwn index 80c0acafa..fe6e1b37d 100644 --- a/doc/not.mdwn +++ b/doc/not.mdwn @@ -30,3 +30,11 @@ situations. It lacks git-annex's support for widely distributed storage, using only a single backend data store. It also does not support partial checkouts of file contents, like git-annex does. + +* git-annex is also not [boar](http://code.google.com/p/boar/), + although it shares many of its goals and characteristics. Boar implements + its own version control system, rather than simply embarcing and + extending git. And while boar supports distributed clones of a repository, + it does not support keeping different files in different clones of the + same repository, which git-annex does, and is an important feature for + large-scale archiving. diff --git a/doc/repomap.png b/doc/repomap.png new file mode 100644 index 000000000..4d334aec9 Binary files /dev/null and b/doc/repomap.png differ diff --git a/doc/transferring_data.mdwn b/doc/transferring_data.mdwn new file mode 100644 index 000000000..9526a3e48 --- /dev/null +++ b/doc/transferring_data.mdwn @@ -0,0 +1,14 @@ +git-annex can transfer data to or from any of a repository's git remotes. +Depending on where the remote is, the data transfer is done using rsync +(over ssh, with automatic resume), or plain cp (with copy-on-write +optimisations on supported filesystems). + +It's equally easy to transfer a single file to or from a repository, +or to launch a retrievel of a massive pile of files from whatever +repositories they are scattered amoung. + +git-annex automatically uses whatever remotes are currently accessible, +preferring ones that are less expensive to talk to. + +[[!img repomap.png caption="A real-world repository interconnection map +(generated by git-annex map)"]] diff --git a/doc/use_case/Alice.mdwn b/doc/use_case/Alice.mdwn index 1dc456d73..836d93572 100644 --- a/doc/use_case/Alice.mdwn +++ b/doc/use_case/Alice.mdwn @@ -10,9 +10,11 @@ When she has 1 bar on her cell, Alice queues up interesting files on her server for later. At a coffee shop, she has git-annex download them to her USB drive. High in the sky or in a remote cabin, she catches up on podcasts, videos, and games, first letting git-annex copy them from -her USB drive to the netbook (this saves battery power). +her USB drive to the netbook (this saves battery power). +([[more about transferring data|transferring_data]]) When she's done, she tells git-annex which to keep and which to remove. They're all removed from her netbook to save space, and Alice knows that next time she syncs up to the net, her changes will be synced back -to her server. +to her server. +([more about distributed version control|distributed_version_control]) diff --git a/doc/use_case/Bob.mdwn b/doc/use_case/Bob.mdwn index 573d9cde9..53c09bc18 100644 --- a/doc/use_case/Bob.mdwn +++ b/doc/use_case/Bob.mdwn @@ -11,8 +11,15 @@ without worry about accidentally deleting anything. When Bob needs access to some files, git-annex can tell him which drive(s) they're on, and easily make them available. Indeed, every drive knows what is on every other drive. +([[more about location tracking|location_tracking]]) + +Bob thinks long-term, and so he's glad that git-annex uses a simple +repository format. He knows his files will be accessible in the future +even if the world has forgotten about git-annex and git. +([[more about future-proofing|future_proofing]]) Run in a cron job, git-annex adds new files to archival drives at night. It also helps Bob keep track of intentional, and unintentional copies of files, and logs information he can use to decide when it's time to duplicate the content of old drives. +([[more about backup copies|copies]]) -- cgit v1.2.3