summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorGravatar Joey Hess <id@joeyh.name>2014-12-03 14:02:29 -0400
committerGravatar Joey Hess <id@joeyh.name>2014-12-03 14:10:52 -0400
commit69957946eaa066406a243edca8fd3e19e7febfee (patch)
tree7ce300577cd986f4f03b5f81446a188916e75097 /doc
parentab9bb79e8f0eaa8d951d46e82b321f8511ded942 (diff)
parent718932c895b38228ab8aed4477d7ce8bba205e5a (diff)
Merge branch 's3-aws'
Diffstat (limited to 'doc')
-rw-r--r--doc/bugs/S3_memory_leaks.mdwn4
-rw-r--r--doc/bugs/S3_upload_not_using_multipart.mdwn8
-rw-r--r--doc/bugs/new_AWS_region___40__eu-central-1__41__.mdwn2
-rw-r--r--doc/special_remotes/S3.mdwn22
-rw-r--r--doc/todo/S3_multipart_interruption_cleanup.mdwn14
5 files changed, 46 insertions, 4 deletions
diff --git a/doc/bugs/S3_memory_leaks.mdwn b/doc/bugs/S3_memory_leaks.mdwn
index 94bbdc398..7dc1e5757 100644
--- a/doc/bugs/S3_memory_leaks.mdwn
+++ b/doc/bugs/S3_memory_leaks.mdwn
@@ -2,9 +2,13 @@ S3 has memory leaks
Sending a file to S3 causes a slow memory increase toward the file size.
+> This is fixed, now that it uses aws. --[[Joey]]
+
Copying the file back from S3 causes a slow memory increase toward the
file size.
+> [[fixed|done]] too! --[[Joey]]
+
The author of hS3 is aware of the problem, and working on it. I think I
have identified the root cause of the buffering; it's done by hS3 so it can
resend the data if S3 sends it a 307 redirect. --[[Joey]]
diff --git a/doc/bugs/S3_upload_not_using_multipart.mdwn b/doc/bugs/S3_upload_not_using_multipart.mdwn
index 5e5d97c6a..cd40e9d2b 100644
--- a/doc/bugs/S3_upload_not_using_multipart.mdwn
+++ b/doc/bugs/S3_upload_not_using_multipart.mdwn
@@ -52,3 +52,11 @@ Please provide any additional information below.
upgrade supported from repository versions: 0 1 2
[[!tag confirmed]]
+
+> [[fixed|done]] This is now supported, when git-annex is built with a new
+> enough version of the aws library. You need to configure the remote to
+> use an appropriate value for multipart, eg:
+>
+> git annex enableremote cloud multipart=1GiB
+>
+> --[[Joey]]
diff --git a/doc/bugs/new_AWS_region___40__eu-central-1__41__.mdwn b/doc/bugs/new_AWS_region___40__eu-central-1__41__.mdwn
index 80f89b243..177f7e138 100644
--- a/doc/bugs/new_AWS_region___40__eu-central-1__41__.mdwn
+++ b/doc/bugs/new_AWS_region___40__eu-central-1__41__.mdwn
@@ -6,3 +6,5 @@ Amazon has opened up a new region in AWS with a datacenter in Frankfurt/Germany.
* Region: eu-central-1
This should be added to the "Adding an Amazon S3 repository" page in the Datacenter dropdown of the webapp.
+
+> [[fixed|done]] --[[Joey]]
diff --git a/doc/special_remotes/S3.mdwn b/doc/special_remotes/S3.mdwn
index fe46948b3..5d161c3b8 100644
--- a/doc/special_remotes/S3.mdwn
+++ b/doc/special_remotes/S3.mdwn
@@ -18,11 +18,11 @@ the S3 remote.
* `encryption` - One of "none", "hybrid", "shared", or "pubkey".
See [[encryption]].
+* `keyid` - Specifies the gpg key to use for [[encryption]].
+
* `chunk` - Enables [[chunking]] when storing large files.
`chunk=1MiB` is a good starting point for chunking.
-* `keyid` - Specifies the gpg key to use for [[encryption]].
-
* `embedcreds` - Optional. Set to "yes" embed the login credentials inside
the git repository, which allows other clones to also access them. This is
the default when gpg encryption is enabled; the credentials are stored
@@ -33,7 +33,8 @@ the S3 remote.
embedcreds without gpg encryption.
* `datacenter` - Defaults to "US". Other values include "EU",
- "us-west-1", and "ap-southeast-1".
+ "us-west-1", "us-west-2", "ap-southeast-1", "ap-southeast-2", and
+ "sa-east-1".
* `storageclass` - Default is "STANDARD". If you have configured git-annex
to preserve multiple [[copies]], consider setting this to "REDUCED_REDUNDANCY"
@@ -46,11 +47,24 @@ the S3 remote.
so by default, a bucket name is chosen based on the remote name
and UUID. This can be specified to pick a bucket name.
+* `partsize` - Amazon S3 only accepts uploads up to a certian file size,
+ and storing larger files requires a multipart upload process.
+
+ Setting `partsize=1GiB` is recommended for Amazon S3 when not using
+ chunking; this will cause multipart uploads to be done using parts
+ up to 1GiB in size. Note that setting partsize to less than 100MiB
+ will cause Amazon S3 to reject uploads.
+
+ This is not enabled by default, since other S3 implementations may
+ not support multipart uploads or have different limits,
+ but can be enabled or changed at any time.
+ time.
+
* `fileprefix` - By default, git-annex places files in a tree rooted at the
top of the S3 bucket. When this is set, it's prefixed to the filenames
used. For example, you could set it to "foo/" in one special remote,
and to "bar/" in another special remote, and both special remotes could
then use the same bucket.
-* `x-amz-*` are passed through as http headers when storing keys
+* `x-amz-meta-*` are passed through as http headers when storing keys
in S3.
diff --git a/doc/todo/S3_multipart_interruption_cleanup.mdwn b/doc/todo/S3_multipart_interruption_cleanup.mdwn
new file mode 100644
index 000000000..adb5fd2cb
--- /dev/null
+++ b/doc/todo/S3_multipart_interruption_cleanup.mdwn
@@ -0,0 +1,14 @@
+When a multipart S3 upload is being made, and gets interrupted,
+the parts remain in the bucket, and S3 may charge for them.
+
+I am not sure what happens if the same object gets uploaded again. Is S3
+nice enough to remove the old parts? I need to find out..
+
+If not, this needs to be dealt with somehow. One way would be to configure an
+expiry of the uploaded parts, but this is tricky as a huge upload could
+take arbitrarily long. Another way would be to record the uploadid and the
+etags of the parts, and then resume where it left off the next time the
+object is sent to S3. (Or at least cancel the old upload; resume isn't
+practical when uploading an encrypted object.)
+
+It could store that info in either the local FS or the git-annex branch.