summaryrefslogtreecommitdiff
path: root/doc/tips/publishing_your_files_to_the_public.mdwn
blob: d2c074503c5923dfb00357a1522280dd66a3f207 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# Creating a special S3 remote to hold files shareable by URL

(In this example, I'll assume you'll be creating a bucket in S3 named **public-annex** and a special remote in git-annex, which will store its files in the previous bucket, named **public-s3**, but change these names if you are going to do the thing for real)

First, in the AWS dashboard, go to (or create) the bucket you will use at S3 and add a public get policy to it:

    {
      "Version": "2008-10-17",
      "Statement": [
        {
          "Sid": "AllowPublicRead",
          "Effect": "Allow",
          "Principal": {
            "AWS": "*"
          },
          "Action": "s3:GetObject",
          "Resource": "arn:aws:s3:::public-annex/*"
        }
      ]
    }

Then set up your special [S3](http://git-annex.branchable.com/special_remotes/S3/) remote with (at least) these options:

    git annex initremote public-s3 type=s3 encryption=none bucket=public-annex chunk=0

This way git-annex will upload the files to this repo, (when you call `git annex copy [FILES...] --to public-s3`) without encrypting them and without chunking them, and, because of the policy of the bucket, they will be accessible by anyone with the link.

Following the example, the files will be accessible at `http://public-annex.s3.amazonaws.com/KEY` where `KEY` is the file key created by git-annex and which you can discover running

    git annex lookupkey FILEPATH

This way you can share a link to each file you have at your S3 remote.

___________________

## Sharing all links in a folder

To share all the links in a given folder, for example, you can go to that folder and run (this is an example with the _fish_ shell, but I'm sure you can do the same in _bash_, I just don't know exactly):

    for filename in (ls)
        echo $filename": https://public-annex.s3.amazonaws.com/"(git annex lookupkey $filename)
    end

## Sharing all links matching certain metadata

The same applies to all the filters you can do with git-annex.

For example, let's share links to all the files whose _author_'s name starts with "Mario" and are, in fact, stored at your public-s3 remote.
However, instead of just a list of links we will output a markdown-formatted list of the filenames linked to their S3 urls:

    for filename in (git annex find --metadata "author=Mario*" --and --in public-s3)
       echo "* ["$filename"](https://public-annex.s3.amazonaws.com/"(git annex lookupkey $filename)")"
    end

Very useful.

## Sharing links with time-limited URLs

By using pre-signed URLs it is possible to create limits on how long a URL is valid for retrieving an object. 
To enable use a private S3 bucket for the remotes and then pre-sign actual URL with the script in [AWS-Tools](https://github.com/gdbtek/aws-tools).
Example:

    key=`git annex lookupkey "$fname"`;  sign_s3_url.bash --region 'eu-west-1' --bucket 'mybuck' --file-path $key --aws-access-key-id XX --aws-secret-access-key XX --method 'GET' --minute-expire 10

## Adding the S3 URL as a source

Assuming all files in the current directory are available on S3, this will register the public S3 url for the file in git-annex, making it available for everyone *through git-annex*:

<pre>
git annex find --in public-s3 | while read file ; do
  key=$(git annex lookupkey $file)
  echo $key https://public-annex.s3.amazonaws.com/$key
done | git annex registerurl
</pre>

`registerurl` was introduced in `5.20150317`. There's a todo open to ensure we don't have to do this by hand: [[todo/credentials-less access to s3]].