diff options
Diffstat (limited to 'doc/tips/Internet_Archive_via_S3.mdwn')
-rw-r--r-- | doc/tips/Internet_Archive_via_S3.mdwn | 49 |
1 files changed, 49 insertions, 0 deletions
diff --git a/doc/tips/Internet_Archive_via_S3.mdwn b/doc/tips/Internet_Archive_via_S3.mdwn new file mode 100644 index 000000000..8c0f2dde7 --- /dev/null +++ b/doc/tips/Internet_Archive_via_S3.mdwn @@ -0,0 +1,49 @@ +[The Internet Archive](http://www.archive.org/) allows members to upload +collections using an Amazon S3 +[compatible API](http://www.archive.org/help/abouts3.txt), and this can +be used with git-annex's [[special_remotes/S3]] support. + +So, you can locally archive things with git-annex, define remotes that +correspond to "items" at the Internet Archive, and use git-annex to upload +your files to there. Of course, your use of the Internet Archive must +comply with their [terms of service](http://www.archive.org/about/terms.php). + +Sign up for an account, and get your access keys here: +<http://www.archive.org/account/s3.php> + + # export AWS_ACCESS_KEY_ID=blahblah + # export AWS_SECRET_ACCESS_KEY=xxxxxxx + +Specify `host=s3.us.archive.org` when doing `initremote` to set up +a remote at the Archive. This will enable a special Internet Archive mode: +Encryption is not allowed; you are required to specify a bucket name +rather than having git-annex pick a random one; and you can optionally +specify `x-archive-meta*` headers to add metadata as explained in their +[documentation](http://www.archive.org/help/abouts3.txt). + +[[!template id=note text=""" +/!\ There seems to be a bug in either hS3 or the archive that breaks +authentication when the bucket name contains spaces or upper-case letters.. +use all lowercase and no spaces when making the bucket with `initremote`. +"""]] + + # git annex initremote archive-panama type=S3 \ + host=s3.us.archive.org bucket=panama-canal-lock-blueprints \ + x-archive-meta-mediatype=texts x-archive-meta-language=eng \ + x-archive-meta-title="original Panama Canal lock design blueprints" + initremote archive-panama (Internet Archive mode) ok + # git annex describe archive-panama "a man, a plan, a canal: panama" + describe archive-panama ok + +Then you can annex files and copy them to the remote as usual: + + # git annex add photo1.jpeg --backend=SHA1E + add photo1.jpeg (checksum...) ok + # git annex copy photo1.jpeg --fast --to archive-panama + copy (to archive-panama...) ok + +Note the use of the SHA1E [[backend|backends]]. It makes most sense +to use the WORM or SHA1E backend for files that will be stored in +the Internet Archive, since the key name will be exposed as the filename +there, and since the Archive does special processing of files based on +their extension. |