diff options
author | 2014-08-08 23:25:38 -0400 | |
---|---|---|
committer | 2014-08-08 23:25:38 -0400 | |
commit | 4475497e653a63823b95a0fa0b176a7baf080a7b (patch) | |
tree | 3fa2a824f16a1d9d44fd9395de1dc7d426835c79 /doc | |
parent | 4e303cca921722d4274eece477b52d196cc7c0e1 (diff) | |
parent | 69e1ee3fde8530361ce4c0569f4ec2175f2d86a7 (diff) |
Merge branch 'newchunks'
Diffstat (limited to 'doc')
-rw-r--r-- | doc/design/assistant/chunks.mdwn | 16 | ||||
-rw-r--r-- | doc/design/assistant/progressbars.mdwn | 4 | ||||
-rw-r--r-- | doc/special_remotes/S3.mdwn | 3 | ||||
-rw-r--r-- | doc/special_remotes/bup.mdwn | 11 | ||||
-rw-r--r-- | doc/special_remotes/gcrypt.mdwn | 4 | ||||
-rw-r--r-- | doc/special_remotes/hook.mdwn | 2 | ||||
-rw-r--r-- | doc/special_remotes/rsync.mdwn | 10 | ||||
-rw-r--r-- | doc/special_remotes/webdav.mdwn | 2 | ||||
-rw-r--r-- | doc/tips/using_Amazon_S3.mdwn | 2 |
9 files changed, 33 insertions, 21 deletions
diff --git a/doc/design/assistant/chunks.mdwn b/doc/design/assistant/chunks.mdwn index a9709a778..0aa389899 100644 --- a/doc/design/assistant/chunks.mdwn +++ b/doc/design/assistant/chunks.mdwn @@ -91,7 +91,7 @@ cannot tell when we've gotten the last chunk. (Also, we cannot strip padding.) Note that `addurl` sometimes generates keys w/o size info (particularly, it does so by design when using quvi). -Problem: Also, this makes `hasKey` hard to implement: How can it know if +Problem: Also, this makes `checkPresent` hard to implement: How can it know if all the chunks are present, if the key size is not known? Problem: Also, this makes it difficult to download encrypted keys, because @@ -111,7 +111,7 @@ So, SHA256-1048576-c1--xxxxxxx for the first chunk of 1 megabyte. Before any chunks are stored, write a chunkcount file, eg SHA256-s12345-c0--xxxxxxx. Note that this key is the same as the original object's key, except with chunk number set to 0. This file contains both -the number of chunks, and also the chunk size used. `hasKey` downloads this +the number of chunks, and also the chunk size used. `checkPresent` downloads this file, and then verifies that each chunk is present, looking for keys with the expected chunk numbers and chunk size. @@ -126,7 +126,7 @@ Note: This design lets an attacker with logs tell the (appoximate) size of objects, by finding the small files that contain a chunk count, and correlating when that is written/read and when other files are written/read. That could be solved by padding the chunkcount key up to the -size of the rest of the keys, but that's very innefficient; `hasKey` is not +size of the rest of the keys, but that's very innefficient; `checkPresent` is not designed to need to download large files. # design 3 @@ -139,7 +139,7 @@ This seems difficult; attacker could probably tell where the first encrypted part stops and the next encrypted part starts by looking for gpg headers, and so tell which files are the first chunks. -Also, `hasKey` would need to download some or all of the first file. +Also, `checkPresent` would need to download some or all of the first file. If all, that's a lot more expensive. If only some is downloaded, an attacker can guess that the file that was partially downloaded is the first chunk in a series, and wait for a time when it's fully downloaded to @@ -163,7 +163,7 @@ The location log does not record locations of individual chunk keys (too space-inneficient). Instead, look at a chunk log in the git-annex branch to get the chunk count and size for a key. -`hasKey` would check if any of the logged sets of chunks is +`checkPresent` would check if any of the logged sets of chunks is present on the remote. It would also check if the non-chunked key is present, as a fallback. @@ -225,7 +225,7 @@ Reasons: Note that this means that the chunks won't exactly match the configured chunk size. gpg does compression, which might make them a -lot smaller. Or gpg overhead could make them slightly larger. So `hasKey` +lot smaller. Or gpg overhead could make them slightly larger. So `checkPresent` cannot check exact file sizes. If padding is enabled, gpg compression should be disabled, to not leak @@ -250,10 +250,10 @@ and skip forward to the next needed chunk. Easy. Uploads: Check if the 1st chunk is present. If so, check the second chunk, etc. Once the first missing chunk is found, start uploading from there. -That adds one extra hasKey call per upload. Probably a win in most cases. +That adds one extra checkPresent call per upload. Probably a win in most cases. Can be improved by making special remotes open a persistent connection that is used for transferring all chunks, as well as for -checking hasKey. +checking checkPresent. Note that this is safe to do only as long as the Key being transferred cannot possibly have 2 different contents in different repos. Notably not diff --git a/doc/design/assistant/progressbars.mdwn b/doc/design/assistant/progressbars.mdwn index 50f424508..7de70452d 100644 --- a/doc/design/assistant/progressbars.mdwn +++ b/doc/design/assistant/progressbars.mdwn @@ -14,7 +14,7 @@ This is one of those potentially hidden but time consuming problems. could use inotify. **done** * When easily available, remotes call the MeterUpdate callback as downloads progress. **done** -* S3 TODO +* S3: TODO While it has a download progress bar, `getObject` probably buffers the whole download in memory before returning. Leaving the progress bar to only display progress for writing the file out of memory. Fixing this would @@ -32,7 +32,7 @@ the MeterUpdate callback as the upload progresses. * webdav: **done** * S3: **done** * glacier: **done** -* bup: TODO +* bup: **done** * hook: Would require the hook interface to somehow do this, which seems too complicated. So skipping. diff --git a/doc/special_remotes/S3.mdwn b/doc/special_remotes/S3.mdwn index 5291a4eb6..fe46948b3 100644 --- a/doc/special_remotes/S3.mdwn +++ b/doc/special_remotes/S3.mdwn @@ -18,6 +18,9 @@ the S3 remote. * `encryption` - One of "none", "hybrid", "shared", or "pubkey". See [[encryption]]. +* `chunk` - Enables [[chunking]] when storing large files. + `chunk=1MiB` is a good starting point for chunking. + * `keyid` - Specifies the gpg key to use for [[encryption]]. * `embedcreds` - Optional. Set to "yes" embed the login credentials inside diff --git a/doc/special_remotes/bup.mdwn b/doc/special_remotes/bup.mdwn index f2d465e77..ca5056917 100644 --- a/doc/special_remotes/bup.mdwn +++ b/doc/special_remotes/bup.mdwn @@ -19,16 +19,17 @@ for example; or clone bup's git repository to further back it up. These parameters can be passed to `git annex initremote` to configure bup: -* `encryption` - One of "none", "hybrid", "shared", or "pubkey". - See [[encryption]]. - -* `keyid` - Specifies the gpg key to use for [[encryption]]. - * `buprepo` - Required. This is passed to `bup` as the `--remote` to use to store data. To create the repository,`bup init` will be run. Example: "buprepo=example.com:/big/mybup" or "buprepo=/big/mybup" (To use the default `~/.bup` repository on the local host, specify "buprepo=") +* `encryption` - One of "none", "hybrid", "shared", or "pubkey". + See [[encryption]]. Note that using encryption will prevent + de-duplication of content stored in the buprepo. + +* `keyid` - Specifies the gpg key to use for [[encryption]]. + Options to pass to `bup split` when sending content to bup can also be specified, by using `git config annex.bup-split-options`. This can be used to, for example, limit its bandwidth. diff --git a/doc/special_remotes/gcrypt.mdwn b/doc/special_remotes/gcrypt.mdwn index 2e07741d3..c9a22b01a 100644 --- a/doc/special_remotes/gcrypt.mdwn +++ b/doc/special_remotes/gcrypt.mdwn @@ -13,7 +13,7 @@ These parameters can be passed to `git annex initremote` to configure gcrypt: * `encryption` - One of "none", "hybrid", "shared", or "pubkey". - See [[encryption]]. + Required. See [[encryption]]. * `keyid` - Specifies the gpg key to use for encryption of both the files git-annex stores in the repository, as well as to encrypt the git @@ -24,6 +24,8 @@ gcrypt: for gcrypt to use. This repository should be either empty, or an existing gcrypt repositry. +* `chunk` - Enables [[chunking]] when storing large files. + * `shellescape` - See [[rsync]] for the details of this option. ## notes diff --git a/doc/special_remotes/hook.mdwn b/doc/special_remotes/hook.mdwn index 8cf31ed02..0bb76d98a 100644 --- a/doc/special_remotes/hook.mdwn +++ b/doc/special_remotes/hook.mdwn @@ -36,6 +36,8 @@ These parameters can be passed to `git annex initremote`: * `keyid` - Specifies the gpg key to use for [[encryption]]. +* `chunk` - Enables [[chunking]] when storing large files. + ## hooks Each type of hook remote is specified by a collection of hook commands. diff --git a/doc/special_remotes/rsync.mdwn b/doc/special_remotes/rsync.mdwn index b2a9d23f5..eb218b181 100644 --- a/doc/special_remotes/rsync.mdwn +++ b/doc/special_remotes/rsync.mdwn @@ -14,14 +14,14 @@ Or for using rsync over SSH These parameters can be passed to `git annex initremote` to configure rsync: +* `rsyncurl` - Required. This is the url or `hostname:/directory` to + pass to rsync to tell it where to store content. + * `encryption` - One of "none", "hybrid", "shared", or "pubkey". See [[encryption]]. * `keyid` - Specifies the gpg key to use for [[encryption]]. -* `rsyncurl` - Required. This is the url or `hostname:/directory` to - pass to rsync to tell it where to store content. - * `shellescape` - Optional. Set to "no" to avoid shell escaping normally done when using rsync over ssh. That escaping is needed with typical setups, but not with some hosting providers that do not expose rsynced @@ -30,6 +30,10 @@ These parameters can be passed to `git annex initremote` to configure rsync: quote (`'`) character. If that happens, you can run enableremote setting shellescape=no. +* `chunk` - Enables [[chunking]] when storing large files. + This is typically not a win for rsync, so no need to enable it. + But, it makes this interoperate with the [[directory]] special remote. + The `annex-rsync-options` git configuration setting can be used to pass parameters to rsync. diff --git a/doc/special_remotes/webdav.mdwn b/doc/special_remotes/webdav.mdwn index 64eed5d0b..6b5f5b122 100644 --- a/doc/special_remotes/webdav.mdwn +++ b/doc/special_remotes/webdav.mdwn @@ -37,4 +37,4 @@ the webdav remote. Setup example: - # WEBDAV_USERNAME=joey@kitenet.net WEBDAV_PASSWORD=xxxxxxx git annex initremote box.com type=webdav url=https://dav.box.com/dav/git-annex chunksize=75mb keyid=joey@kitenet.net + # WEBDAV_USERNAME=joey@kitenet.net WEBDAV_PASSWORD=xxxxxxx git annex initremote box.com type=webdav url=https://dav.box.com/dav/git-annex chunk=10mb keyid=joey@kitenet.net diff --git a/doc/tips/using_Amazon_S3.mdwn b/doc/tips/using_Amazon_S3.mdwn index 0c68c7387..ede3f952f 100644 --- a/doc/tips/using_Amazon_S3.mdwn +++ b/doc/tips/using_Amazon_S3.mdwn @@ -14,7 +14,7 @@ like "2512E3C7" Next, create the S3 remote, and describe it. - # git annex initremote cloud type=S3 keyid=2512E3C7 + # git annex initremote cloud type=S3 chunk=1MiB keyid=2512E3C7 initremote cloud (encryption setup with gpg key C910D9222512E3C7) (checking bucket) (creating bucket in US) (gpg) ok # git annex describe cloud "at Amazon's US datacenter" describe cloud ok |