summaryrefslogtreecommitdiff
path: root/doc/walkthrough/Internet_Archive_via_S3.mdwn
blob: e0f8fafb4465f8b578b2c8e318cd1ebee2291af7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
[The Internet Archive](http://www.archive.org/) allows members to upload
collections using an Amazon S3 
[compatible API](http://www.archive.org/help/abouts3.txt), and this can
be used with git-annex's [[special_remotes/S3]] support. 

So, if you're an archivist, you can locally archive things with git-annex,
and define remotes that correspond to "items" at the Internet Archive,
and use git-annex to upload your files to there.
Of course, your use of the Internet Archive must comply with their
[terms of service](http://www.archive.org/about/terms.php).

Sign up for an account, and get your access keys here:
<http://www.archive.org/account/s3.php>
	
	# export AWS_ACCESS_KEY_ID=blahblah
	# export AWS_SECRET_ACCESS_KEY=xxxxxxx

Specify `host=s3.us.archive.org` when doing `initremote` to set up
a remote at the Archive. This will enable a special Internet Archive mode:
Encryption is not allowed; you are required to specify a bucket name
rather than letting git-annex pick a random one; and you can optionally
specify `x-archive-meta*` headers to add metadata as explained in their
[documentation](http://www.archive.org/help/abouts3.txt).

	# git annex initremote archive-panama type=S3
	# host=s3.us.archive.org bucket=panama-canal-lock-blueprints \
		x-archive-meta-mediatype=texts x-archive-meta-language=eng \
		x-archive-meta-title="original Panama Canal lock design blueprints"
	initremote archive-panama (Internet Archive mode) (checking bucket) (creating bucket in US) ok
	# git annex describe archive-panama "Internet Archive item for my grandfather's Panama Canal lock design blueprints"
	describe archive-panama ok

Then you can annex files and copy them to the remote as usual:

	# git annex add photo1.jpeg
	add photo1.jpeg ok
	# git annex copy photo1.jpeg --fast --to archive-panama
	copy (to archive-panama...) ok

-----

Note that it probably makes the most sense to use the WORM backend
for files, since that exposes the original filename in the key stored
in the Archive, which allows its special processing for sound files,
movies, etc to be done. 

Also, the Internet Archive has restrictions on what is allowed in a
filename; particularly no spaces are allowed. 

There seems to be a bug in either hS3 or the archive that breaks
authentication when the bucket name contains spaces or upper-case letters..
use all lowercase and no spaces when making the bucket with `initremote`.