diff options
author | Joey Hess <joeyh@joeyh.name> | 2017-09-08 16:19:38 -0400 |
---|---|---|
committer | Joey Hess <joeyh@joeyh.name> | 2017-09-08 16:28:28 -0400 |
commit | 4e44bd5314174c2e71d93d124ec5067052f2ec56 (patch) | |
tree | aa5fa2fae8890061ec90cf46cacc7ca4476e7a77 | |
parent | 5ef1c9b5690057e5b18dc7dcc3627776b400c544 (diff) |
S3 export finalization
Fixed ACL issue, and updated some documentation.
10 files changed, 120 insertions, 80 deletions
diff --git a/Remote/S3.hs b/Remote/S3.hs index 96d24d00e..f80a08bb2 100644 --- a/Remote/S3.hs +++ b/Remote/S3.hs @@ -357,14 +357,16 @@ checkPresentExportS3 r info _k loc = go = withS3Handle (config r) (gitconfig r) (uuid r) $ \h -> do checkKeyHelper info h (T.pack $ bucketExportLocation info loc) +-- S3 has no move primitive; copy and delete. renameExportS3 :: Remote -> S3Info -> Key -> ExportLocation -> ExportLocation -> Annex Bool renameExportS3 r info _k src dest = catchNonAsync go (\e -> warning (show e) >> return False) where go = withS3Handle (config r) (gitconfig r) (uuid r) $ \h -> do - -- S3 has no move primitive; copy and delete. - void $ sendS3Handle h $ S3.copyObject (bucket info) dstobject + let co = S3.copyObject (bucket info) dstobject (S3.ObjectId (bucket info) srcobject Nothing) S3.CopyMetadata + -- ACL is not preserved by copy. + void $ sendS3Handle h $ co { S3.coAcl = acl info } void $ sendS3Handle h $ S3.DeleteObject srcobject (bucket info) return True srcobject = T.pack $ bucketExportLocation info src diff --git a/doc/tips/public_Amazon_S3_remote.mdwn b/doc/tips/public_Amazon_S3_remote.mdwn index d362fd75d..ce484adfb 100644 --- a/doc/tips/public_Amazon_S3_remote.mdwn +++ b/doc/tips/public_Amazon_S3_remote.mdwn @@ -2,6 +2,9 @@ Here's how to create a Amazon [[S3 special remote|special_remotes/S3]] that can be read by anyone who gets a clone of your git-annex repository, without them needing Amazon AWS credentials. +If you want to publish files to S3 so they can be accessed without using +git-annex, see [[publishing_your_files_to_the_public]]. + Note: Bear in mind that Amazon will charge the owner of the bucket for public downloads from that bucket. @@ -52,6 +55,3 @@ who are not using git-annex. To find the url, use `git annex whereis`. ---- See [[special_remotes/S3]] for details about configuring S3 remotes. - -See [[publishing_your_files_to_the_public]] for other ways to use a public -S3 bucket. diff --git a/doc/tips/publishing_your_files_to_the_public.mdwn b/doc/tips/publishing_your_files_to_the_public.mdwn index 5409dda0d..f7d332d57 100644 --- a/doc/tips/publishing_your_files_to_the_public.mdwn +++ b/doc/tips/publishing_your_files_to_the_public.mdwn @@ -1,88 +1,39 @@ # Creating a special S3 remote to hold files shareable by URL -(In this example, I'll assume you'll be creating a bucket in S3 named **public-annex** and a special remote in git-annex, which will store its files in the previous bucket, named **public-s3**, but change these names if you are going to do the thing for real) +In this example, I'll assume you'll be creating a bucket in Amazon S3 named +$BUCKET and a special remote named public-s3. Be sure to replace $BUCKET +with something like "public-bucket-joey" when you follow along in your +shell. -Set up your special [S3](http://git-annex.branchable.com/special_remotes/S3/) remote with (at least) these options: +Set up your special [[S3 remote|special_remotes/S3]] with (at least) these options: - git annex initremote public-s3 type=s3 encryption=none bucket=public-annex chunk=0 public=yes + git annex initremote public-s3 type=s3 encryption=none bucket=$BUCKET exporttree=yes public=yes encryption=none -This way git-annex will upload the files to this repo, (when you call `git -annex copy [FILES...] --to public-s3`) without encrypting them and without -chunking them. And, thanks to the public=yes, they will be -accessible by anyone with the link. +Then export the files in the master branch to the remote: -(Note that public=yes was added in git-annex version 5.20150605. -If you have an older version, it will be silently ignored, and you -will instead need to use the AWS dashboard to configure a public get policy -for the bucket.) + git annex export master --to public-s3 -Following the example, the files will be accessible at `http://public-annex.s3.amazonaws.com/KEY` where `KEY` is the file key created by git-annex and which you can discover running +You can run that command again to update the export. See +[[git-annex-export]] for details. - git annex lookupkey FILEPATH +Each exported file will be available to the public from +`http://$BUCKET.s3.amazonaws.com/$file` -This way you can share a link to each file you have at your S3 remote. +Note: Bear in mind that Amazon will charge the owner of the bucket +for public downloads from that bucket. -## Sharing all links in a folder +# Indexes -To share all the links in a given folder, for example, you can go to that folder and run (this is an example with the _fish_ shell, but I'm sure you can do the same in _bash_, I just don't know exactly): +By default, there is no index.ntml file exported, so if you open +`http://$BUCKET.s3.amazonaws.com/` in a web browser, you'll see an +XML document listing the files. - for filename in (ls) - echo $filename": https://public-annex.s3.amazonaws.com/"(git annex lookupkey $filename) - end +For a nicer list of files, you can make an index.html file, check it into +git, and export it to the bucket. You'll need to configure the bucket to +use index.html as its index document, as +[explained here](https://stackoverflow.com/questions/27899/is-there-a-way-to-have-index-html-functionality-with-content-hosted-on-s3). -## Sharing all links matching certain metadata +# Old method -The same applies to all the filters you can do with git-annex. - -For example, let's share links to all the files whose _author_'s name starts with "Mario" and are, in fact, stored at your public-s3 remote. -However, instead of just a list of links we will output a markdown-formatted list of the filenames linked to their S3 urls: - - for filename in (git annex find --metadata "author=Mario*" --and --in public-s3) - echo "* ["$filename"](https://public-annex.s3.amazonaws.com/"(git annex lookupkey $filename)")" - end - -Very useful. - -## Sharing links with time-limited URLs - -By using pre-signed URLs it is possible to create limits on how long a URL is valid for retrieving an object. -To enable use a private S3 bucket for the remotes and then pre-sign actual URL with the script in [AWS-Tools](https://github.com/gdbtek/aws-tools). -Example: - - key=`git annex lookupkey "$fname"`; sign_s3_url.bash --region 'eu-west-1' --bucket 'mybuck' --file-path $key --aws-access-key-id XX --aws-secret-access-key XX --method 'GET' --minute-expire 10 - -## Adding the S3 URL as a source - -Assuming all files in the current directory are available on S3, this will register the public S3 url for the file in git-annex, making it available for everyone *through git-annex*: - -<pre> -git annex find --in public-s3 | while read file ; do - key=$(git annex lookupkey $file) - echo $key https://public-annex.s3.amazonaws.com/$key -done | git annex registerurl -</pre> - -`registerurl` was introduced in `5.20150317`. - -## Manually configuring a public get policy - -Here is how to manually configure a public get policy -for a bucket, in the AWS dashboard. - - { - "Version": "2008-10-17", - "Statement": [ - { - "Sid": "AllowPublicRead", - "Effect": "Allow", - "Principal": { - "AWS": "*" - }, - "Action": "s3:GetObject", - "Resource": "arn:aws:s3:::public-annex/*" - } - ] - } - -This should not be necessary if using a new enough version -of git-annex, which can instead be configured with public=yet. +To use `git annex export`, you need git-annex version 6.20170909 or +newer. Before we had `git annex export` an [[old_method]] was used instead. diff --git a/doc/tips/publishing_your_files_to_the_public/old_method.mdwn b/doc/tips/publishing_your_files_to_the_public/old_method.mdwn new file mode 100644 index 000000000..5409dda0d --- /dev/null +++ b/doc/tips/publishing_your_files_to_the_public/old_method.mdwn @@ -0,0 +1,88 @@ +# Creating a special S3 remote to hold files shareable by URL + +(In this example, I'll assume you'll be creating a bucket in S3 named **public-annex** and a special remote in git-annex, which will store its files in the previous bucket, named **public-s3**, but change these names if you are going to do the thing for real) + +Set up your special [S3](http://git-annex.branchable.com/special_remotes/S3/) remote with (at least) these options: + + git annex initremote public-s3 type=s3 encryption=none bucket=public-annex chunk=0 public=yes + +This way git-annex will upload the files to this repo, (when you call `git +annex copy [FILES...] --to public-s3`) without encrypting them and without +chunking them. And, thanks to the public=yes, they will be +accessible by anyone with the link. + +(Note that public=yes was added in git-annex version 5.20150605. +If you have an older version, it will be silently ignored, and you +will instead need to use the AWS dashboard to configure a public get policy +for the bucket.) + +Following the example, the files will be accessible at `http://public-annex.s3.amazonaws.com/KEY` where `KEY` is the file key created by git-annex and which you can discover running + + git annex lookupkey FILEPATH + +This way you can share a link to each file you have at your S3 remote. + +## Sharing all links in a folder + +To share all the links in a given folder, for example, you can go to that folder and run (this is an example with the _fish_ shell, but I'm sure you can do the same in _bash_, I just don't know exactly): + + for filename in (ls) + echo $filename": https://public-annex.s3.amazonaws.com/"(git annex lookupkey $filename) + end + +## Sharing all links matching certain metadata + +The same applies to all the filters you can do with git-annex. + +For example, let's share links to all the files whose _author_'s name starts with "Mario" and are, in fact, stored at your public-s3 remote. +However, instead of just a list of links we will output a markdown-formatted list of the filenames linked to their S3 urls: + + for filename in (git annex find --metadata "author=Mario*" --and --in public-s3) + echo "* ["$filename"](https://public-annex.s3.amazonaws.com/"(git annex lookupkey $filename)")" + end + +Very useful. + +## Sharing links with time-limited URLs + +By using pre-signed URLs it is possible to create limits on how long a URL is valid for retrieving an object. +To enable use a private S3 bucket for the remotes and then pre-sign actual URL with the script in [AWS-Tools](https://github.com/gdbtek/aws-tools). +Example: + + key=`git annex lookupkey "$fname"`; sign_s3_url.bash --region 'eu-west-1' --bucket 'mybuck' --file-path $key --aws-access-key-id XX --aws-secret-access-key XX --method 'GET' --minute-expire 10 + +## Adding the S3 URL as a source + +Assuming all files in the current directory are available on S3, this will register the public S3 url for the file in git-annex, making it available for everyone *through git-annex*: + +<pre> +git annex find --in public-s3 | while read file ; do + key=$(git annex lookupkey $file) + echo $key https://public-annex.s3.amazonaws.com/$key +done | git annex registerurl +</pre> + +`registerurl` was introduced in `5.20150317`. + +## Manually configuring a public get policy + +Here is how to manually configure a public get policy +for a bucket, in the AWS dashboard. + + { + "Version": "2008-10-17", + "Statement": [ + { + "Sid": "AllowPublicRead", + "Effect": "Allow", + "Principal": { + "AWS": "*" + }, + "Action": "s3:GetObject", + "Resource": "arn:aws:s3:::public-annex/*" + } + ] + } + +This should not be necessary if using a new enough version +of git-annex, which can instead be configured with public=yet. diff --git a/doc/tips/publishing_your_files_to_the_public/comment_1_48f545ce26dbec944f96796ed3b9204d._comment b/doc/tips/publishing_your_files_to_the_public/old_method/comment_1_48f545ce26dbec944f96796ed3b9204d._comment index 6ee85367e..6ee85367e 100644 --- a/doc/tips/publishing_your_files_to_the_public/comment_1_48f545ce26dbec944f96796ed3b9204d._comment +++ b/doc/tips/publishing_your_files_to_the_public/old_method/comment_1_48f545ce26dbec944f96796ed3b9204d._comment diff --git a/doc/tips/publishing_your_files_to_the_public/comment_2_27a40806d009d617b3ad56873197bf87._comment b/doc/tips/publishing_your_files_to_the_public/old_method/comment_2_27a40806d009d617b3ad56873197bf87._comment index 9cca4e2fa..9cca4e2fa 100644 --- a/doc/tips/publishing_your_files_to_the_public/comment_2_27a40806d009d617b3ad56873197bf87._comment +++ b/doc/tips/publishing_your_files_to_the_public/old_method/comment_2_27a40806d009d617b3ad56873197bf87._comment diff --git a/doc/tips/publishing_your_files_to_the_public/comment_3_2f5045629e40e8d881725876190c7846._comment b/doc/tips/publishing_your_files_to_the_public/old_method/comment_3_2f5045629e40e8d881725876190c7846._comment index c76d3a30c..c76d3a30c 100644 --- a/doc/tips/publishing_your_files_to_the_public/comment_3_2f5045629e40e8d881725876190c7846._comment +++ b/doc/tips/publishing_your_files_to_the_public/old_method/comment_3_2f5045629e40e8d881725876190c7846._comment diff --git a/doc/tips/publishing_your_files_to_the_public/comment_4_37405f20da790141187e9f780c999448._comment b/doc/tips/publishing_your_files_to_the_public/old_method/comment_4_37405f20da790141187e9f780c999448._comment index 2855c3fdd..2855c3fdd 100644 --- a/doc/tips/publishing_your_files_to_the_public/comment_4_37405f20da790141187e9f780c999448._comment +++ b/doc/tips/publishing_your_files_to_the_public/old_method/comment_4_37405f20da790141187e9f780c999448._comment diff --git a/doc/tips/publishing_your_files_to_the_public/comment_5_29c3ee4aed6a5b53b6767a96a7b85ad9._comment b/doc/tips/publishing_your_files_to_the_public/old_method/comment_5_29c3ee4aed6a5b53b6767a96a7b85ad9._comment index bd77d03ce..bd77d03ce 100644 --- a/doc/tips/publishing_your_files_to_the_public/comment_5_29c3ee4aed6a5b53b6767a96a7b85ad9._comment +++ b/doc/tips/publishing_your_files_to_the_public/old_method/comment_5_29c3ee4aed6a5b53b6767a96a7b85ad9._comment diff --git a/doc/todo/export.mdwn b/doc/todo/export.mdwn index 535678c2a..ac77b3d72 100644 --- a/doc/todo/export.mdwn +++ b/doc/todo/export.mdwn @@ -29,7 +29,6 @@ Work is in progress. Todo list: Would need git-annex sync to export to the master tree? This is similar to the little-used preferreddir= preferred content setting and the "public" repository group. -* Test S3 export. * Test export to IA via S3. In particualar, does removing an exported file work? |