summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2014-07-23 17:55:28 -0400
committerGravatar Joey Hess <joey@kitenet.net>2014-07-23 17:55:28 -0400
commitf9ad0ce0524fc842850e93cb253df432ce829ed7 (patch)
treef52d99d3d517fa3a57dc8f544dd85b7cbe39a2aa /doc
parent4a759a06c90ae69a07f7ec3ea22c20100844c512 (diff)
minor
Diffstat (limited to 'doc')
-rw-r--r--doc/design/assistant/chunks.mdwn22
1 files changed, 12 insertions, 10 deletions
diff --git a/doc/design/assistant/chunks.mdwn b/doc/design/assistant/chunks.mdwn
index 53dbf20f4..49cffac81 100644
--- a/doc/design/assistant/chunks.mdwn
+++ b/doc/design/assistant/chunks.mdwn
@@ -21,10 +21,7 @@ could lead to data loss. For example, suppose A is 10 mb, and B is 20 mb,
and the upload speed is the same. If B starts first, when A will overwrite
the file it is uploading for the 1st chunk. Then A uploads the second
chunk, and once A is done, B finishes the 1st chunk and uploads its second.
-We now have 1(from A), 2(from B).
-
-This needs to be supported for back-compat, so keep the chunksize= setting
-to enable that mode, and add a new setting for the new mode.
+We now have [chunk 1(from A), chunk 2(from B)].
# new requirements
@@ -42,6 +39,10 @@ on in the webapp when configuring an existing remote).
Two concurrent uploaders of the same object to a remote should be safe,
even if they're using different chunk sizes.
+The old chunk method needs to be supported for back-compat, so
+keep the chunksize= setting to enable that mode, and add a new setting
+for the new mode.
+
# obscuring file sizes
To hide from a remote any information about the sizes of files could be
@@ -72,7 +73,7 @@ And, obviously, if someone stores 10 tb of data in a remote, they probably
have around 10 tb of files, so it's probably not a collection of recipes..
Given its inneficiencies and lack of fully obscuring file sizes,
-padding may not be worth adding.
+padding may not be worth adding, but is considered in the designs below.
# design 1
@@ -153,15 +154,15 @@ could lead to data loss. (Same as in design 2.)
# design 4
+Use key SHA256-s10000-c1--xxxxxxx for the first chunk of 1 megabyte.
+
Instead of storing the chunk count in the special remote, store it in
the git-annex branch.
-So, use key SHA256-s10000-c1--xxxxxxx for the first chunk of 1 megabyte.
-
-And look at git-annex:aaa/bbb/SHA256-s12345--xxxxxxx.log.cnk to get the
+Look at git-annex:aaa/bbb/SHA256-s12345--xxxxxxx.log.cnk to get the
chunk count and size. File format would be:
- ts uuid chunksize chunkcount
+ ts uuid chunksize chunkcount
Note that a given remote uuid might have multiple lines, if a key was
stored on it twice using different chunk sizes. Also note that even when
@@ -173,10 +174,11 @@ the files on the remote. It would also check if the non-chunked key is
present.
When dropping a key from the remote, drop all logged chunk sizes.
+(Also drop any non-chunked key.)
+
As long as the location log and the new log are committed atomically,
this guarantees that no orphaned chunks end up on a remote
(except any that might be left by interrupted uploads).
-(Also drop any non-chunked key.)
This has the best security of the designs so far, because the special
remote doesn't know anything about chunk sizes. It uses a little more