summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2014-08-08 23:25:38 -0400
committerGravatar Joey Hess <joey@kitenet.net>2014-08-08 23:25:38 -0400
commit4475497e653a63823b95a0fa0b176a7baf080a7b (patch)
tree3fa2a824f16a1d9d44fd9395de1dc7d426835c79 /doc
parent4e303cca921722d4274eece477b52d196cc7c0e1 (diff)
parent69e1ee3fde8530361ce4c0569f4ec2175f2d86a7 (diff)
Merge branch 'newchunks'
Diffstat (limited to 'doc')
-rw-r--r--doc/design/assistant/chunks.mdwn16
-rw-r--r--doc/design/assistant/progressbars.mdwn4
-rw-r--r--doc/special_remotes/S3.mdwn3
-rw-r--r--doc/special_remotes/bup.mdwn11
-rw-r--r--doc/special_remotes/gcrypt.mdwn4
-rw-r--r--doc/special_remotes/hook.mdwn2
-rw-r--r--doc/special_remotes/rsync.mdwn10
-rw-r--r--doc/special_remotes/webdav.mdwn2
-rw-r--r--doc/tips/using_Amazon_S3.mdwn2
9 files changed, 33 insertions, 21 deletions
diff --git a/doc/design/assistant/chunks.mdwn b/doc/design/assistant/chunks.mdwn
index a9709a778..0aa389899 100644
--- a/doc/design/assistant/chunks.mdwn
+++ b/doc/design/assistant/chunks.mdwn
@@ -91,7 +91,7 @@ cannot tell when we've gotten the last chunk. (Also, we cannot strip
padding.) Note that `addurl` sometimes generates keys w/o size info
(particularly, it does so by design when using quvi).
-Problem: Also, this makes `hasKey` hard to implement: How can it know if
+Problem: Also, this makes `checkPresent` hard to implement: How can it know if
all the chunks are present, if the key size is not known?
Problem: Also, this makes it difficult to download encrypted keys, because
@@ -111,7 +111,7 @@ So, SHA256-1048576-c1--xxxxxxx for the first chunk of 1 megabyte.
Before any chunks are stored, write a chunkcount file, eg
SHA256-s12345-c0--xxxxxxx. Note that this key is the same as the original
object's key, except with chunk number set to 0. This file contains both
-the number of chunks, and also the chunk size used. `hasKey` downloads this
+the number of chunks, and also the chunk size used. `checkPresent` downloads this
file, and then verifies that each chunk is present, looking for keys with
the expected chunk numbers and chunk size.
@@ -126,7 +126,7 @@ Note: This design lets an attacker with logs tell the (appoximate) size of
objects, by finding the small files that contain a chunk count, and
correlating when that is written/read and when other files are
written/read. That could be solved by padding the chunkcount key up to the
-size of the rest of the keys, but that's very innefficient; `hasKey` is not
+size of the rest of the keys, but that's very innefficient; `checkPresent` is not
designed to need to download large files.
# design 3
@@ -139,7 +139,7 @@ This seems difficult; attacker could probably tell where the first encrypted
part stops and the next encrypted part starts by looking for gpg headers,
and so tell which files are the first chunks.
-Also, `hasKey` would need to download some or all of the first file.
+Also, `checkPresent` would need to download some or all of the first file.
If all, that's a lot more expensive. If only some is downloaded, an
attacker can guess that the file that was partially downloaded is the
first chunk in a series, and wait for a time when it's fully downloaded to
@@ -163,7 +163,7 @@ The location log does not record locations of individual chunk keys
(too space-inneficient). Instead, look at a chunk log in the
git-annex branch to get the chunk count and size for a key.
-`hasKey` would check if any of the logged sets of chunks is
+`checkPresent` would check if any of the logged sets of chunks is
present on the remote. It would also check if the non-chunked key is
present, as a fallback.
@@ -225,7 +225,7 @@ Reasons:
Note that this means that the chunks won't exactly match the configured
chunk size. gpg does compression, which might make them a
-lot smaller. Or gpg overhead could make them slightly larger. So `hasKey`
+lot smaller. Or gpg overhead could make them slightly larger. So `checkPresent`
cannot check exact file sizes.
If padding is enabled, gpg compression should be disabled, to not leak
@@ -250,10 +250,10 @@ and skip forward to the next needed chunk. Easy.
Uploads: Check if the 1st chunk is present. If so, check the second chunk,
etc. Once the first missing chunk is found, start uploading from there.
-That adds one extra hasKey call per upload. Probably a win in most cases.
+That adds one extra checkPresent call per upload. Probably a win in most cases.
Can be improved by making special remotes open a persistent
connection that is used for transferring all chunks, as well as for
-checking hasKey.
+checking checkPresent.
Note that this is safe to do only as long as the Key being transferred
cannot possibly have 2 different contents in different repos. Notably not
diff --git a/doc/design/assistant/progressbars.mdwn b/doc/design/assistant/progressbars.mdwn
index 50f424508..7de70452d 100644
--- a/doc/design/assistant/progressbars.mdwn
+++ b/doc/design/assistant/progressbars.mdwn
@@ -14,7 +14,7 @@ This is one of those potentially hidden but time consuming problems.
could use inotify. **done**
* When easily available, remotes call the MeterUpdate callback as downloads
progress. **done**
-* S3 TODO
+* S3: TODO
While it has a download progress bar, `getObject` probably buffers the whole
download in memory before returning. Leaving the progress bar to only
display progress for writing the file out of memory. Fixing this would
@@ -32,7 +32,7 @@ the MeterUpdate callback as the upload progresses.
* webdav: **done**
* S3: **done**
* glacier: **done**
-* bup: TODO
+* bup: **done**
* hook: Would require the hook interface to somehow do this, which seems
too complicated. So skipping.
diff --git a/doc/special_remotes/S3.mdwn b/doc/special_remotes/S3.mdwn
index 5291a4eb6..fe46948b3 100644
--- a/doc/special_remotes/S3.mdwn
+++ b/doc/special_remotes/S3.mdwn
@@ -18,6 +18,9 @@ the S3 remote.
* `encryption` - One of "none", "hybrid", "shared", or "pubkey".
See [[encryption]].
+* `chunk` - Enables [[chunking]] when storing large files.
+ `chunk=1MiB` is a good starting point for chunking.
+
* `keyid` - Specifies the gpg key to use for [[encryption]].
* `embedcreds` - Optional. Set to "yes" embed the login credentials inside
diff --git a/doc/special_remotes/bup.mdwn b/doc/special_remotes/bup.mdwn
index f2d465e77..ca5056917 100644
--- a/doc/special_remotes/bup.mdwn
+++ b/doc/special_remotes/bup.mdwn
@@ -19,16 +19,17 @@ for example; or clone bup's git repository to further back it up.
These parameters can be passed to `git annex initremote` to configure bup:
-* `encryption` - One of "none", "hybrid", "shared", or "pubkey".
- See [[encryption]].
-
-* `keyid` - Specifies the gpg key to use for [[encryption]].
-
* `buprepo` - Required. This is passed to `bup` as the `--remote`
to use to store data. To create the repository,`bup init` will be run.
Example: "buprepo=example.com:/big/mybup" or "buprepo=/big/mybup"
(To use the default `~/.bup` repository on the local host, specify "buprepo=")
+* `encryption` - One of "none", "hybrid", "shared", or "pubkey".
+ See [[encryption]]. Note that using encryption will prevent
+ de-duplication of content stored in the buprepo.
+
+* `keyid` - Specifies the gpg key to use for [[encryption]].
+
Options to pass to `bup split` when sending content to bup can also
be specified, by using `git config annex.bup-split-options`. This
can be used to, for example, limit its bandwidth.
diff --git a/doc/special_remotes/gcrypt.mdwn b/doc/special_remotes/gcrypt.mdwn
index 2e07741d3..c9a22b01a 100644
--- a/doc/special_remotes/gcrypt.mdwn
+++ b/doc/special_remotes/gcrypt.mdwn
@@ -13,7 +13,7 @@ These parameters can be passed to `git annex initremote` to configure
gcrypt:
* `encryption` - One of "none", "hybrid", "shared", or "pubkey".
- See [[encryption]].
+ Required. See [[encryption]].
* `keyid` - Specifies the gpg key to use for encryption of both the files
git-annex stores in the repository, as well as to encrypt the git
@@ -24,6 +24,8 @@ gcrypt:
for gcrypt to use. This repository should be either empty, or an existing
gcrypt repositry.
+* `chunk` - Enables [[chunking]] when storing large files.
+
* `shellescape` - See [[rsync]] for the details of this option.
## notes
diff --git a/doc/special_remotes/hook.mdwn b/doc/special_remotes/hook.mdwn
index 8cf31ed02..0bb76d98a 100644
--- a/doc/special_remotes/hook.mdwn
+++ b/doc/special_remotes/hook.mdwn
@@ -36,6 +36,8 @@ These parameters can be passed to `git annex initremote`:
* `keyid` - Specifies the gpg key to use for [[encryption]].
+* `chunk` - Enables [[chunking]] when storing large files.
+
## hooks
Each type of hook remote is specified by a collection of hook commands.
diff --git a/doc/special_remotes/rsync.mdwn b/doc/special_remotes/rsync.mdwn
index b2a9d23f5..eb218b181 100644
--- a/doc/special_remotes/rsync.mdwn
+++ b/doc/special_remotes/rsync.mdwn
@@ -14,14 +14,14 @@ Or for using rsync over SSH
These parameters can be passed to `git annex initremote` to configure rsync:
+* `rsyncurl` - Required. This is the url or `hostname:/directory` to
+ pass to rsync to tell it where to store content.
+
* `encryption` - One of "none", "hybrid", "shared", or "pubkey".
See [[encryption]].
* `keyid` - Specifies the gpg key to use for [[encryption]].
-* `rsyncurl` - Required. This is the url or `hostname:/directory` to
- pass to rsync to tell it where to store content.
-
* `shellescape` - Optional. Set to "no" to avoid shell escaping normally
done when using rsync over ssh. That escaping is needed with typical
setups, but not with some hosting providers that do not expose rsynced
@@ -30,6 +30,10 @@ These parameters can be passed to `git annex initremote` to configure rsync:
quote (`'`) character. If that happens, you can run enableremote
setting shellescape=no.
+* `chunk` - Enables [[chunking]] when storing large files.
+ This is typically not a win for rsync, so no need to enable it.
+ But, it makes this interoperate with the [[directory]] special remote.
+
The `annex-rsync-options` git configuration setting can be used to pass
parameters to rsync.
diff --git a/doc/special_remotes/webdav.mdwn b/doc/special_remotes/webdav.mdwn
index 64eed5d0b..6b5f5b122 100644
--- a/doc/special_remotes/webdav.mdwn
+++ b/doc/special_remotes/webdav.mdwn
@@ -37,4 +37,4 @@ the webdav remote.
Setup example:
- # WEBDAV_USERNAME=joey@kitenet.net WEBDAV_PASSWORD=xxxxxxx git annex initremote box.com type=webdav url=https://dav.box.com/dav/git-annex chunksize=75mb keyid=joey@kitenet.net
+ # WEBDAV_USERNAME=joey@kitenet.net WEBDAV_PASSWORD=xxxxxxx git annex initremote box.com type=webdav url=https://dav.box.com/dav/git-annex chunk=10mb keyid=joey@kitenet.net
diff --git a/doc/tips/using_Amazon_S3.mdwn b/doc/tips/using_Amazon_S3.mdwn
index 0c68c7387..ede3f952f 100644
--- a/doc/tips/using_Amazon_S3.mdwn
+++ b/doc/tips/using_Amazon_S3.mdwn
@@ -14,7 +14,7 @@ like "2512E3C7"
Next, create the S3 remote, and describe it.
- # git annex initremote cloud type=S3 keyid=2512E3C7
+ # git annex initremote cloud type=S3 chunk=1MiB keyid=2512E3C7
initremote cloud (encryption setup with gpg key C910D9222512E3C7) (checking bucket) (creating bucket in US) (gpg) ok
# git annex describe cloud "at Amazon's US datacenter"
describe cloud ok