Merge branch 's3-aws'

author: Joey Hess <id@joeyh.name> 2014-12-03 14:02:29 -0400
committer: Joey Hess <id@joeyh.name> 2014-12-03 14:10:52 -0400
commit: 69957946eaa066406a243edca8fd3e19e7febfee (patch)
tree: 7ce300577cd986f4f03b5f81446a188916e75097 /doc
parent: ab9bb79e8f0eaa8d951d46e82b321f8511ded942 (diff)
parent: 718932c895b38228ab8aed4477d7ce8bba205e5a (diff)
5 files changed, 46 insertions, 4 deletions
diff --git a/doc/bugs/S3_memory_leaks.mdwn b/doc/bugs/S3_memory_leaks.mdwn
index 94bbdc398..7dc1e5757 100644
--- a/doc/bugs/S3_memory_leaks.mdwn
+++ b/doc/bugs/S3_memory_leaks.mdwn
@@ -2,9 +2,13 @@ S3 has memory leaks
 
 Sending a file to S3 causes a slow memory increase toward the file size.
 
+> This is fixed, now that it uses aws. --[[Joey]]
+
 Copying the file back from S3 causes a slow memory increase toward the
 file size.
 
+> [[fixed|done]] too! --[[Joey]]
+
 The author of hS3 is aware of the problem, and working on it. I think I
 have identified the root cause of the buffering; it's done by hS3 so it can
 resend the data if S3 sends it a 307 redirect. --[[Joey]]
diff --git a/doc/bugs/S3_upload_not_using_multipart.mdwn b/doc/bugs/S3_upload_not_using_multipart.mdwn
index 5e5d97c6a..cd40e9d2b 100644
--- a/doc/bugs/S3_upload_not_using_multipart.mdwn
+++ b/doc/bugs/S3_upload_not_using_multipart.mdwn
@@ -52,3 +52,11 @@ Please provide any additional information below.
 	upgrade supported from repository versions: 0 1 2
 
 [[!tag confirmed]]
+
+> [[fixed|done]] This is now supported, when git-annex is built with a new
+> enough version of the aws library. You need to configure the remote to
+> use an appropriate value for multipart, eg:
+> 
+> git annex enableremote cloud multipart=1GiB
+> 
+> --[[Joey]]
diff --git a/doc/bugs/new_AWS_region___40__eu-central-1__41__.mdwn b/doc/bugs/new_AWS_region___40__eu-central-1__41__.mdwn
index 80f89b243..177f7e138 100644
--- a/doc/bugs/new_AWS_region___40__eu-central-1__41__.mdwn
+++ b/doc/bugs/new_AWS_region___40__eu-central-1__41__.mdwn
@@ -6,3 +6,5 @@ Amazon has opened up a new region in AWS with a datacenter in Frankfurt/Germany.
 * Region: eu-central-1
 
 This should be added to the "Adding an Amazon S3 repository" page in the Datacenter dropdown of the webapp.
+
+> [[fixed|done]] --[[Joey]]
diff --git a/doc/special_remotes/S3.mdwn b/doc/special_remotes/S3.mdwn
index fe46948b3..5d161c3b8 100644
--- a/doc/special_remotes/S3.mdwn
+++ b/doc/special_remotes/S3.mdwn
@@ -18,11 +18,11 @@ the S3 remote.
 * `encryption` - One of "none", "hybrid", "shared", or "pubkey".
   See [[encryption]].
 
+* `keyid` - Specifies the gpg key to use for [[encryption]].
+
 * `chunk` - Enables [[chunking]] when storing large files.
   `chunk=1MiB` is a good starting point for chunking.
 
-* `keyid` - Specifies the gpg key to use for [[encryption]].
-
 * `embedcreds` - Optional. Set to "yes" embed the login credentials inside
   the git repository, which allows other clones to also access them. This is
   the default when gpg encryption is enabled; the credentials are stored
@@ -33,7 +33,8 @@ the S3 remote.
   embedcreds without gpg encryption.
 
 * `datacenter` - Defaults to "US". Other values include "EU",
-  "us-west-1", and "ap-southeast-1".
+  "us-west-1", "us-west-2", "ap-southeast-1", "ap-southeast-2", and
+  "sa-east-1".
 
 * `storageclass` - Default is "STANDARD". If you have configured git-annex
   to preserve multiple [[copies]], consider setting this to "REDUCED_REDUNDANCY"
@@ -46,11 +47,24 @@ the S3 remote.
   so by default, a bucket name is chosen based on the remote name
   and UUID. This can be specified to pick a bucket name.
 
+* `partsize` - Amazon S3 only accepts uploads up to a certian file size,
+  and storing larger files requires a multipart upload process.
+
+  Setting `partsize=1GiB` is recommended for Amazon S3 when not using
+  chunking; this will cause multipart uploads to be done using parts
+  up to 1GiB in size. Note that setting partsize to less than 100MiB
+  will cause Amazon S3 to reject uploads.
+
+  This is not enabled by default, since other S3 implementations may
+  not support multipart uploads or have different limits,
+  but can be enabled or changed at any time.
+  time.
+
 * `fileprefix` - By default, git-annex places files in a tree rooted at the
   top of the S3 bucket. When this is set, it's prefixed to the filenames
   used. For example, you could set it to "foo/" in one special remote,
   and to "bar/" in another special remote, and both special remotes could
   then use the same bucket.
 
-* `x-amz-*` are passed through as http headers when storing keys
+* `x-amz-meta-*` are passed through as http headers when storing keys
   in S3.
diff --git a/doc/todo/S3_multipart_interruption_cleanup.mdwn b/doc/todo/S3_multipart_interruption_cleanup.mdwn
new file mode 100644
index 000000000..adb5fd2cb
--- /dev/null
+++ b/doc/todo/S3_multipart_interruption_cleanup.mdwn
@@ -0,0 +1,14 @@
+When a multipart S3 upload is being made, and gets interrupted,
+the parts remain in the bucket, and S3 may charge for them.
+
+I am not sure what happens if the same object gets uploaded again. Is S3
+nice enough to remove the old parts? I need to find out..
+
+If not, this needs to be dealt with somehow. One way would be to configure an
+expiry of the uploaded parts, but this is tricky as a huge upload could
+take arbitrarily long. Another way would be to record the uploadid and the
+etags of the parts, and then resume where it left off the next time the
+object is sent to S3. (Or at least cancel the old upload; resume isn't
+practical when uploading an encrypted object.) 
+
+It could store that info in either the local FS or the git-annex branch.
author	Joey Hess <id@joeyh.name>	2014-12-03 14:02:29 -0400
committer	Joey Hess <id@joeyh.name>	2014-12-03 14:10:52 -0400
commit	69957946eaa066406a243edca8fd3e19e7febfee (patch)
tree	7ce300577cd986f4f03b5f81446a188916e75097 /doc
parent	ab9bb79e8f0eaa8d951d46e82b321f8511ded942 (diff)
parent	718932c895b38228ab8aed4477d7ce8bba205e5a (diff)