summaryrefslogtreecommitdiff
path: root/doc/design
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2014-07-23 22:38:14 -0400
committerGravatar Joey Hess <joey@kitenet.net>2014-07-23 22:38:14 -0400
commitb61f1a2d86e10116177915cc94ae895c9318400e (patch)
tree8f2167a4051b8d65894c2303f3eec03f85055a64 /doc/design
parent5d695075fc0b0fd0e38ead098ee4cd9a87db0b5a (diff)
chunk then encrypt
Diffstat (limited to 'doc/design')
-rw-r--r--doc/design/assistant/chunks.mdwn30
1 files changed, 23 insertions, 7 deletions
diff --git a/doc/design/assistant/chunks.mdwn b/doc/design/assistant/chunks.mdwn
index 224c719f8..6523a207f 100644
--- a/doc/design/assistant/chunks.mdwn
+++ b/doc/design/assistant/chunks.mdwn
@@ -55,13 +55,6 @@ another goal of chunking. At least two things are needed for this:
so that a remote sees only encrypted files with uniform sizes
and cannot make guesses about the kinds of data being stored.
-Note that encrypting the whole file and then chunking and padding it is not
-good because the remote can probably examine files and tell when a gpg
-stream has been cut into peices, even without the key (have not verified
-this, but it seems likely; certianly gpg magic numbers can identify gpg
-encrypted files so a file that's encrypted but lacks the magic is not the
-first chunk..).
-
Note that padding cannot completely hide all information from an attacker
who is logging puts or gets. An attacker could, for example, look at the
times of puts, and guess at when git-annex has moved on to
@@ -184,3 +177,26 @@ This has the best security of the designs so far, because the special
remote doesn't know anything about chunk sizes. It uses a little more
data in the git-annex branch, although with care (using the same timestamp
as the location log), it can compress pretty well.
+
+## chunk then encrypt
+
+Rather than encrypting the whole object 1st and then chunking, chunk and
+then encrypt.
+
+Reasons:
+
+1. If 2 repos are uploading the same key to a remote concurrently,
+ this allows some chunks to come from one and some from another,
+ and be reassembled without problems.
+
+2. Prevents an attacker from re-assembling the chunked file using details
+ of the gpg output. Which would expose file size if padding is being used
+ to obscure it.
+
+Note that this means that the chunks won't exactly match the configured
+chunk size. gpg does compression, which might make them a
+lot smaller. Or gpg overhead could make them slightly larger. So `hasKey`
+cannot check exact file sizes.
+
+If padding is enabled, gpg compression should be disabled, to not leak
+clues about how well the files compress and so what kind of file it is.