summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2012-11-20 16:43:58 -0400
committerGravatar Joey Hess <joey@kitenet.net>2012-11-20 16:43:58 -0400
commit0d378285e709833f87547fd6fedc4e8b2f4884c4 (patch)
tree6caa2c4ba7710c917751d26c5bf58cce2a1163e2 /doc
parentcba848b472a4ac323693b44fcef9ddbbe535c929 (diff)
Amazon Glacier special remote; 100% working
Diffstat (limited to 'doc')
-rw-r--r--doc/design/assistant/cloud.mdwn2
-rw-r--r--doc/git-annex.mdwn16
-rw-r--r--doc/special_remotes.mdwn1
-rw-r--r--doc/special_remotes/glacier.mdwn51
-rw-r--r--doc/tips/using_Amazon_Glacier.mdwn69
-rw-r--r--doc/tips/using_Amazon_S3.mdwn2
-rw-r--r--doc/todo/special_remote_for_amazon_glacier.mdwn5
7 files changed, 144 insertions, 2 deletions
diff --git a/doc/design/assistant/cloud.mdwn b/doc/design/assistant/cloud.mdwn
index 8df7ac753..3f5524cb1 100644
--- a/doc/design/assistant/cloud.mdwn
+++ b/doc/design/assistant/cloud.mdwn
@@ -15,7 +15,7 @@ More should be added, such as:
* Box.com (it's free, and current method is hard to set up and a sorta
shakey; a better method would be to use its API) **done**
* Dropbox? That would be ironic.. Via its API, presumably.
-* [[Amazon Glacier|todo/special_remote_for_amazon_glacier]]
+* [[Amazon Glacier|todo/special_remote_for_amazon_glacier]] **done**
* [nimbus.io](https://nimbus.io/) Fairly low prices ($0.06/GB);
REST API; free software
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn
index c4ba9917b..7646e4392 100644
--- a/doc/git-annex.mdwn
+++ b/doc/git-annex.mdwn
@@ -568,6 +568,17 @@ subdirectories).
The repository should be specified using the name of a configured remote,
or the UUID or description of a repository.
+* --trust-glacier-inventory
+
+ Amazon Glacier inventories take hours to retrieve, and may not represent
+ the current state of a repository. So git-annex does not trust that
+ files that the inventory claims are in Glacier are really there.
+ This switch can be used to allow it to trust the inventory.
+
+ Be careful using this, especially if you or someone else might have recently
+ removed a file from Glacier. If you try to drop the only other copy of the
+ file, and this switch is enabled, you could lose data!
+
* --backend=name
Specifies which key-value backend to use. This can be used when
@@ -885,6 +896,11 @@ Here are all the supported configuration settings.
Used to identify Amazon S3 special remotes.
Normally this is automatically set up by `git annex initremote`.
+* `remote.<name>.glacier`
+
+ Used to identify Amazon Glacier special remotes.
+ Normally this is automatically set up by `git annex initremote`.
+
* `remote.<name>.webdav`
Used to identify webdav special remotes.
diff --git a/doc/special_remotes.mdwn b/doc/special_remotes.mdwn
index 65fcb8768..b6d6be5bb 100644
--- a/doc/special_remotes.mdwn
+++ b/doc/special_remotes.mdwn
@@ -19,6 +19,7 @@ into many cloud services. Here are specific instructions
for various cloud things:
* [[tips/using_Amazon_S3]]
+* [[tips/using_Amazon_Glacier]]
* [[tips/Internet_Archive_via_S3]]
* [[tahoe-lafs|forum/tips:_special__95__remotes__47__hook_with_tahoe-lafs]]
* [[tips/using_box.com_as_a_special_remote]]
diff --git a/doc/special_remotes/glacier.mdwn b/doc/special_remotes/glacier.mdwn
new file mode 100644
index 000000000..f02f36694
--- /dev/null
+++ b/doc/special_remotes/glacier.mdwn
@@ -0,0 +1,51 @@
+This special remote type stores file contents in Amazon Glacier.
+
+To use it, you need to have [glacier-cli](http://github.com/basak/glacier-cli)
+installed.
+
+The unusual thing about Amazon Glacier is the multiple-hour delay it takes
+to retrieve information out of Glacier. To deal with this, commands like
+"git-annex get" request Glacier start the retrieval process, and will fail
+due to the data not yet being available. You can then wait appriximately
+four hours, re-run the same command, and this time, it will actually
+download the data.
+
+## configuration
+
+The standard environment variables `AWS_ACCESS_KEY_ID` and
+`AWS_SECRET_ACCESS_KEY` are used to supply login credentials
+for Amazon. You need to set these only when running
+`git annex initremote`, as they will be cached in a file only you
+can read inside the local git repository.
+
+A number of parameters can be passed to `git annex initremote` to configure
+the Glacier remote.
+
+* `encryption` - Required. Either "none" to disable encryption (not recommended),
+ or a value that can be looked up (using gpg -k) to find a gpg encryption
+ key that will be given access to the remote, or "shared" which allows
+ every clone of the repository to access the encrypted data (use with caution).
+
+ Note that additional gpg keys can be given access to a remote by
+ rerunning initremote with the new key id. See [[encryption]].
+
+* `embedcreds` - Optional. Set to "yes" embed the login credentials inside
+ the git repository, which allows other clones to also access them. This is
+ the default when gpg encryption is enabled; the credentials are stored
+ encrypted and only those with the repository's keys can access them.
+
+ It is not the default when using shared encryption, or no encryption.
+ Think carefully about who can access your repository before using
+ embedcreds without gpg encryption.
+
+* `datacenter` - Defaults to "us-east-1".
+
+* `vault` - Glacier requires that vaults have a globally unique name,
+ so by default, a vault name is chosen based on the remote name
+ and UUID. This can be specified to pick a valult name.
+
+* `fileprefix` - By default, git-annex places files in a tree rooted at the
+ top of the Glacier vault. When this is set, it's prefixed to the filenames
+ used. For example, you could set it to "foo/" in one special remote,
+ and to "bar/" in another special remote, and both special remotes could
+ then use the same vault.
diff --git a/doc/tips/using_Amazon_Glacier.mdwn b/doc/tips/using_Amazon_Glacier.mdwn
new file mode 100644
index 000000000..73c248e63
--- /dev/null
+++ b/doc/tips/using_Amazon_Glacier.mdwn
@@ -0,0 +1,69 @@
+Amazon Glacier provides low-cost storage, well suited for archiving and
+backup. But it takes around 4 hours to get content out of Glacier.
+
+Recent versions of git-annex support Glacier. To use it, you need to have
+[glacier-cli](http://github.com/basak/glacier-cli) installed.
+
+First, export your Amazon AWS credentials:
+
+ # export AWS_ACCESS_KEY_ID="08TJMT99S3511WOZEP91"
+ # export AWS_SECRET_ACCESS_KEY="s3kr1t"
+
+Now, create a gpg key, if you don't already have one. This will be used
+to encrypt everything stored in Glacier, for your privacy. Once you have
+a gpg key, run `gpg --list-secret-keys` to look up its key id, something
+like "2512E3C7"
+
+Next, create the Glacier remote.
+
+ # git annex initremote glacier type=glacier encryption=2512E3C7
+ initremote glacier (encryption setup with gpg key C910D9222512E3C7) (gpg) ok
+
+The configuration for the Glacier remote is stored in git. So to make another
+repository use the same Glacier remote is easy:
+
+ # cd /media/usb/annex
+ # git pull laptop
+ # git annex initremote glacier
+ initremote glacier (gpg) ok
+
+Now the remote can be used like any other remote.
+
+ # git annex move my_cool_big_file --to glacier
+ copy my_cool_big_file (gpg) (checking glacier...) (to glacier...) ok
+
+But, when you try to get a file out of Glacier, it'll queue a retrieval
+job:
+
+ # git annex get my_cool_big_file
+ get my_cool_big_file (from glacier...) (gpg)
+ glacier: queued retrieval job for archive 'GPGHMACSHA1--862afd4e67e3946587a9ef7fa5beb4e8f1aeb6b8'
+ Recommend you wait up to 4 hours, and then run this command again.
+ failed
+
+Like it says, you'll need to run the command again later. Let's remember to
+do that:
+
+ # at now + 4 hours
+ at> git annex get my_cool_big_file
+
+Another oddity of Glacier is that git-annex is never entirely sure
+if a file is still in Glacier. Glacier inventories take hours to retrieve,
+and even when retrieved do not necessarily represent the current state.
+
+So, git-annex plays it safe, and avoids trusting the inventory:
+
+ # git annex copy important_file --to glacier
+ copy important_file (gpg) (checking glacier...) (to glacier...) ok
+ # git annex drop important_file
+ drop important_file (gpg) (checking glacier...)
+ However, the inventory could be out of date, if it was recently removed.
+ (Use --trust-glacier-inventory if you're sure it's still in Glacier.)
+
+ (unsafe)
+ Could only verify the existence of 0 out of 1 necessary copies
+
+Like it says, you can use `--trust-glacier-inventory` if you're sure
+Glacier's inventory is correct and up-to-date.
+
+See [[special_remotes/Glacier]] for details.
diff --git a/doc/tips/using_Amazon_S3.mdwn b/doc/tips/using_Amazon_S3.mdwn
index 128819fcb..19997d026 100644
--- a/doc/tips/using_Amazon_S3.mdwn
+++ b/doc/tips/using_Amazon_S3.mdwn
@@ -2,7 +2,7 @@ git-annex extends git's usual remotes with some [[special_remotes]], that
are not git repositories. This way you can set up a remote using say,
Amazon S3, and use git-annex to transfer files into the cloud.
-First, export your S3 credentials:
+First, export your Amazon AWS credentials:
# export AWS_ACCESS_KEY_ID="08TJMT99S3511WOZEP91"
# export AWS_SECRET_ACCESS_KEY="s3kr1t"
diff --git a/doc/todo/special_remote_for_amazon_glacier.mdwn b/doc/todo/special_remote_for_amazon_glacier.mdwn
index a6e524cdd..0fa77b527 100644
--- a/doc/todo/special_remote_for_amazon_glacier.mdwn
+++ b/doc/todo/special_remote_for_amazon_glacier.mdwn
@@ -18,8 +18,13 @@ run, or files to transfer, at that point.
--[[Joey]]
+> [[done]]! --[[Joey]]
+
-----
> In the coming months, Amazon S3 will introduce an option that will allow customers to seamlessly move data between Amazon S3 and Amazon Glacier based on data lifecycle policies.
-- <http://aws.amazon.com/glacier/faqs/#How_should_I_choose_between_Amazon_Glacier_and_Amazon_S3>
+
+>> They did, but it's IMHO not very useful for git-annex. It's rather
+>> intended to allow aging S3 storage out to Glacier. --[[Joey]]