diff options
author | Joey Hess <id@joeyh.name> | 2014-12-03 14:02:29 -0400 |
---|---|---|
committer | Joey Hess <id@joeyh.name> | 2014-12-03 14:10:52 -0400 |
commit | 69957946eaa066406a243edca8fd3e19e7febfee (patch) | |
tree | 7ce300577cd986f4f03b5f81446a188916e75097 /doc | |
parent | ab9bb79e8f0eaa8d951d46e82b321f8511ded942 (diff) | |
parent | 718932c895b38228ab8aed4477d7ce8bba205e5a (diff) |
Merge branch 's3-aws'
Diffstat (limited to 'doc')
-rw-r--r-- | doc/bugs/S3_memory_leaks.mdwn | 4 | ||||
-rw-r--r-- | doc/bugs/S3_upload_not_using_multipart.mdwn | 8 | ||||
-rw-r--r-- | doc/bugs/new_AWS_region___40__eu-central-1__41__.mdwn | 2 | ||||
-rw-r--r-- | doc/special_remotes/S3.mdwn | 22 | ||||
-rw-r--r-- | doc/todo/S3_multipart_interruption_cleanup.mdwn | 14 |
5 files changed, 46 insertions, 4 deletions
diff --git a/doc/bugs/S3_memory_leaks.mdwn b/doc/bugs/S3_memory_leaks.mdwn index 94bbdc398..7dc1e5757 100644 --- a/doc/bugs/S3_memory_leaks.mdwn +++ b/doc/bugs/S3_memory_leaks.mdwn @@ -2,9 +2,13 @@ S3 has memory leaks Sending a file to S3 causes a slow memory increase toward the file size. +> This is fixed, now that it uses aws. --[[Joey]] + Copying the file back from S3 causes a slow memory increase toward the file size. +> [[fixed|done]] too! --[[Joey]] + The author of hS3 is aware of the problem, and working on it. I think I have identified the root cause of the buffering; it's done by hS3 so it can resend the data if S3 sends it a 307 redirect. --[[Joey]] diff --git a/doc/bugs/S3_upload_not_using_multipart.mdwn b/doc/bugs/S3_upload_not_using_multipart.mdwn index 5e5d97c6a..cd40e9d2b 100644 --- a/doc/bugs/S3_upload_not_using_multipart.mdwn +++ b/doc/bugs/S3_upload_not_using_multipart.mdwn @@ -52,3 +52,11 @@ Please provide any additional information below. upgrade supported from repository versions: 0 1 2 [[!tag confirmed]] + +> [[fixed|done]] This is now supported, when git-annex is built with a new +> enough version of the aws library. You need to configure the remote to +> use an appropriate value for multipart, eg: +> +> git annex enableremote cloud multipart=1GiB +> +> --[[Joey]] diff --git a/doc/bugs/new_AWS_region___40__eu-central-1__41__.mdwn b/doc/bugs/new_AWS_region___40__eu-central-1__41__.mdwn index 80f89b243..177f7e138 100644 --- a/doc/bugs/new_AWS_region___40__eu-central-1__41__.mdwn +++ b/doc/bugs/new_AWS_region___40__eu-central-1__41__.mdwn @@ -6,3 +6,5 @@ Amazon has opened up a new region in AWS with a datacenter in Frankfurt/Germany. * Region: eu-central-1 This should be added to the "Adding an Amazon S3 repository" page in the Datacenter dropdown of the webapp. + +> [[fixed|done]] --[[Joey]] diff --git a/doc/special_remotes/S3.mdwn b/doc/special_remotes/S3.mdwn index fe46948b3..5d161c3b8 100644 --- a/doc/special_remotes/S3.mdwn +++ b/doc/special_remotes/S3.mdwn @@ -18,11 +18,11 @@ the S3 remote. * `encryption` - One of "none", "hybrid", "shared", or "pubkey". See [[encryption]]. +* `keyid` - Specifies the gpg key to use for [[encryption]]. + * `chunk` - Enables [[chunking]] when storing large files. `chunk=1MiB` is a good starting point for chunking. -* `keyid` - Specifies the gpg key to use for [[encryption]]. - * `embedcreds` - Optional. Set to "yes" embed the login credentials inside the git repository, which allows other clones to also access them. This is the default when gpg encryption is enabled; the credentials are stored @@ -33,7 +33,8 @@ the S3 remote. embedcreds without gpg encryption. * `datacenter` - Defaults to "US". Other values include "EU", - "us-west-1", and "ap-southeast-1". + "us-west-1", "us-west-2", "ap-southeast-1", "ap-southeast-2", and + "sa-east-1". * `storageclass` - Default is "STANDARD". If you have configured git-annex to preserve multiple [[copies]], consider setting this to "REDUCED_REDUNDANCY" @@ -46,11 +47,24 @@ the S3 remote. so by default, a bucket name is chosen based on the remote name and UUID. This can be specified to pick a bucket name. +* `partsize` - Amazon S3 only accepts uploads up to a certian file size, + and storing larger files requires a multipart upload process. + + Setting `partsize=1GiB` is recommended for Amazon S3 when not using + chunking; this will cause multipart uploads to be done using parts + up to 1GiB in size. Note that setting partsize to less than 100MiB + will cause Amazon S3 to reject uploads. + + This is not enabled by default, since other S3 implementations may + not support multipart uploads or have different limits, + but can be enabled or changed at any time. + time. + * `fileprefix` - By default, git-annex places files in a tree rooted at the top of the S3 bucket. When this is set, it's prefixed to the filenames used. For example, you could set it to "foo/" in one special remote, and to "bar/" in another special remote, and both special remotes could then use the same bucket. -* `x-amz-*` are passed through as http headers when storing keys +* `x-amz-meta-*` are passed through as http headers when storing keys in S3. diff --git a/doc/todo/S3_multipart_interruption_cleanup.mdwn b/doc/todo/S3_multipart_interruption_cleanup.mdwn new file mode 100644 index 000000000..adb5fd2cb --- /dev/null +++ b/doc/todo/S3_multipart_interruption_cleanup.mdwn @@ -0,0 +1,14 @@ +When a multipart S3 upload is being made, and gets interrupted, +the parts remain in the bucket, and S3 may charge for them. + +I am not sure what happens if the same object gets uploaded again. Is S3 +nice enough to remove the old parts? I need to find out.. + +If not, this needs to be dealt with somehow. One way would be to configure an +expiry of the uploaded parts, but this is tricky as a huge upload could +take arbitrarily long. Another way would be to record the uploadid and the +etags of the parts, and then resume where it left off the next time the +object is sent to S3. (Or at least cancel the old upload; resume isn't +practical when uploading an encrypted object.) + +It could store that info in either the local FS or the git-annex branch. |