diff options
author | 2012-11-20 16:43:58 -0400 | |
---|---|---|
committer | 2012-11-20 16:43:58 -0400 | |
commit | 0d378285e709833f87547fd6fedc4e8b2f4884c4 (patch) | |
tree | 6caa2c4ba7710c917751d26c5bf58cce2a1163e2 /doc | |
parent | cba848b472a4ac323693b44fcef9ddbbe535c929 (diff) |
Amazon Glacier special remote; 100% working
Diffstat (limited to 'doc')
-rw-r--r-- | doc/design/assistant/cloud.mdwn | 2 | ||||
-rw-r--r-- | doc/git-annex.mdwn | 16 | ||||
-rw-r--r-- | doc/special_remotes.mdwn | 1 | ||||
-rw-r--r-- | doc/special_remotes/glacier.mdwn | 51 | ||||
-rw-r--r-- | doc/tips/using_Amazon_Glacier.mdwn | 69 | ||||
-rw-r--r-- | doc/tips/using_Amazon_S3.mdwn | 2 | ||||
-rw-r--r-- | doc/todo/special_remote_for_amazon_glacier.mdwn | 5 |
7 files changed, 144 insertions, 2 deletions
diff --git a/doc/design/assistant/cloud.mdwn b/doc/design/assistant/cloud.mdwn index 8df7ac753..3f5524cb1 100644 --- a/doc/design/assistant/cloud.mdwn +++ b/doc/design/assistant/cloud.mdwn @@ -15,7 +15,7 @@ More should be added, such as: * Box.com (it's free, and current method is hard to set up and a sorta shakey; a better method would be to use its API) **done** * Dropbox? That would be ironic.. Via its API, presumably. -* [[Amazon Glacier|todo/special_remote_for_amazon_glacier]] +* [[Amazon Glacier|todo/special_remote_for_amazon_glacier]] **done** * [nimbus.io](https://nimbus.io/) Fairly low prices ($0.06/GB); REST API; free software diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index c4ba9917b..7646e4392 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -568,6 +568,17 @@ subdirectories). The repository should be specified using the name of a configured remote, or the UUID or description of a repository. +* --trust-glacier-inventory + + Amazon Glacier inventories take hours to retrieve, and may not represent + the current state of a repository. So git-annex does not trust that + files that the inventory claims are in Glacier are really there. + This switch can be used to allow it to trust the inventory. + + Be careful using this, especially if you or someone else might have recently + removed a file from Glacier. If you try to drop the only other copy of the + file, and this switch is enabled, you could lose data! + * --backend=name Specifies which key-value backend to use. This can be used when @@ -885,6 +896,11 @@ Here are all the supported configuration settings. Used to identify Amazon S3 special remotes. Normally this is automatically set up by `git annex initremote`. +* `remote.<name>.glacier` + + Used to identify Amazon Glacier special remotes. + Normally this is automatically set up by `git annex initremote`. + * `remote.<name>.webdav` Used to identify webdav special remotes. diff --git a/doc/special_remotes.mdwn b/doc/special_remotes.mdwn index 65fcb8768..b6d6be5bb 100644 --- a/doc/special_remotes.mdwn +++ b/doc/special_remotes.mdwn @@ -19,6 +19,7 @@ into many cloud services. Here are specific instructions for various cloud things: * [[tips/using_Amazon_S3]] +* [[tips/using_Amazon_Glacier]] * [[tips/Internet_Archive_via_S3]] * [[tahoe-lafs|forum/tips:_special__95__remotes__47__hook_with_tahoe-lafs]] * [[tips/using_box.com_as_a_special_remote]] diff --git a/doc/special_remotes/glacier.mdwn b/doc/special_remotes/glacier.mdwn new file mode 100644 index 000000000..f02f36694 --- /dev/null +++ b/doc/special_remotes/glacier.mdwn @@ -0,0 +1,51 @@ +This special remote type stores file contents in Amazon Glacier. + +To use it, you need to have [glacier-cli](http://github.com/basak/glacier-cli) +installed. + +The unusual thing about Amazon Glacier is the multiple-hour delay it takes +to retrieve information out of Glacier. To deal with this, commands like +"git-annex get" request Glacier start the retrieval process, and will fail +due to the data not yet being available. You can then wait appriximately +four hours, re-run the same command, and this time, it will actually +download the data. + +## configuration + +The standard environment variables `AWS_ACCESS_KEY_ID` and +`AWS_SECRET_ACCESS_KEY` are used to supply login credentials +for Amazon. You need to set these only when running +`git annex initremote`, as they will be cached in a file only you +can read inside the local git repository. + +A number of parameters can be passed to `git annex initremote` to configure +the Glacier remote. + +* `encryption` - Required. Either "none" to disable encryption (not recommended), + or a value that can be looked up (using gpg -k) to find a gpg encryption + key that will be given access to the remote, or "shared" which allows + every clone of the repository to access the encrypted data (use with caution). + + Note that additional gpg keys can be given access to a remote by + rerunning initremote with the new key id. See [[encryption]]. + +* `embedcreds` - Optional. Set to "yes" embed the login credentials inside + the git repository, which allows other clones to also access them. This is + the default when gpg encryption is enabled; the credentials are stored + encrypted and only those with the repository's keys can access them. + + It is not the default when using shared encryption, or no encryption. + Think carefully about who can access your repository before using + embedcreds without gpg encryption. + +* `datacenter` - Defaults to "us-east-1". + +* `vault` - Glacier requires that vaults have a globally unique name, + so by default, a vault name is chosen based on the remote name + and UUID. This can be specified to pick a valult name. + +* `fileprefix` - By default, git-annex places files in a tree rooted at the + top of the Glacier vault. When this is set, it's prefixed to the filenames + used. For example, you could set it to "foo/" in one special remote, + and to "bar/" in another special remote, and both special remotes could + then use the same vault. diff --git a/doc/tips/using_Amazon_Glacier.mdwn b/doc/tips/using_Amazon_Glacier.mdwn new file mode 100644 index 000000000..73c248e63 --- /dev/null +++ b/doc/tips/using_Amazon_Glacier.mdwn @@ -0,0 +1,69 @@ +Amazon Glacier provides low-cost storage, well suited for archiving and +backup. But it takes around 4 hours to get content out of Glacier. + +Recent versions of git-annex support Glacier. To use it, you need to have +[glacier-cli](http://github.com/basak/glacier-cli) installed. + +First, export your Amazon AWS credentials: + + # export AWS_ACCESS_KEY_ID="08TJMT99S3511WOZEP91" + # export AWS_SECRET_ACCESS_KEY="s3kr1t" + +Now, create a gpg key, if you don't already have one. This will be used +to encrypt everything stored in Glacier, for your privacy. Once you have +a gpg key, run `gpg --list-secret-keys` to look up its key id, something +like "2512E3C7" + +Next, create the Glacier remote. + + # git annex initremote glacier type=glacier encryption=2512E3C7 + initremote glacier (encryption setup with gpg key C910D9222512E3C7) (gpg) ok + +The configuration for the Glacier remote is stored in git. So to make another +repository use the same Glacier remote is easy: + + # cd /media/usb/annex + # git pull laptop + # git annex initremote glacier + initremote glacier (gpg) ok + +Now the remote can be used like any other remote. + + # git annex move my_cool_big_file --to glacier + copy my_cool_big_file (gpg) (checking glacier...) (to glacier...) ok + +But, when you try to get a file out of Glacier, it'll queue a retrieval +job: + + # git annex get my_cool_big_file + get my_cool_big_file (from glacier...) (gpg) + glacier: queued retrieval job for archive 'GPGHMACSHA1--862afd4e67e3946587a9ef7fa5beb4e8f1aeb6b8' + Recommend you wait up to 4 hours, and then run this command again. + failed + +Like it says, you'll need to run the command again later. Let's remember to +do that: + + # at now + 4 hours + at> git annex get my_cool_big_file + +Another oddity of Glacier is that git-annex is never entirely sure +if a file is still in Glacier. Glacier inventories take hours to retrieve, +and even when retrieved do not necessarily represent the current state. + +So, git-annex plays it safe, and avoids trusting the inventory: + + # git annex copy important_file --to glacier + copy important_file (gpg) (checking glacier...) (to glacier...) ok + # git annex drop important_file + drop important_file (gpg) (checking glacier...) + However, the inventory could be out of date, if it was recently removed. + (Use --trust-glacier-inventory if you're sure it's still in Glacier.) + + (unsafe) + Could only verify the existence of 0 out of 1 necessary copies + +Like it says, you can use `--trust-glacier-inventory` if you're sure +Glacier's inventory is correct and up-to-date. + +See [[special_remotes/Glacier]] for details. diff --git a/doc/tips/using_Amazon_S3.mdwn b/doc/tips/using_Amazon_S3.mdwn index 128819fcb..19997d026 100644 --- a/doc/tips/using_Amazon_S3.mdwn +++ b/doc/tips/using_Amazon_S3.mdwn @@ -2,7 +2,7 @@ git-annex extends git's usual remotes with some [[special_remotes]], that are not git repositories. This way you can set up a remote using say, Amazon S3, and use git-annex to transfer files into the cloud. -First, export your S3 credentials: +First, export your Amazon AWS credentials: # export AWS_ACCESS_KEY_ID="08TJMT99S3511WOZEP91" # export AWS_SECRET_ACCESS_KEY="s3kr1t" diff --git a/doc/todo/special_remote_for_amazon_glacier.mdwn b/doc/todo/special_remote_for_amazon_glacier.mdwn index a6e524cdd..0fa77b527 100644 --- a/doc/todo/special_remote_for_amazon_glacier.mdwn +++ b/doc/todo/special_remote_for_amazon_glacier.mdwn @@ -18,8 +18,13 @@ run, or files to transfer, at that point. --[[Joey]] +> [[done]]! --[[Joey]] + ----- > In the coming months, Amazon S3 will introduce an option that will allow customers to seamlessly move data between Amazon S3 and Amazon Glacier based on data lifecycle policies. -- <http://aws.amazon.com/glacier/faqs/#How_should_I_choose_between_Amazon_Glacier_and_Amazon_S3> + +>> They did, but it's IMHO not very useful for git-annex. It's rather +>> intended to allow aging S3 storage out to Glacier. --[[Joey]] |