Merge branch 'master' of ssh://git-annex.branchable.com

author: Joey Hess <joey@kitenet.net> 2013-08-21 15:44:08 -0400
committer: Joey Hess <joey@kitenet.net> 2013-08-21 15:44:08 -0400
commit: eff764be08164d27307fa1161dfbedc97657afb7 (patch)
tree: d26954c315ff7919bb158045cace5e36546a8d50
parent: 00b125dff83117138bb32c4146c032dd70160ab6 (diff)
parent: 3bab9ee055251d27f0a8e9b68c30b0c4136c20ea (diff)
1 files changed, 39 insertions, 0 deletions
diff --git a/doc/todo/wishlist:_perform_fsck_remotely.mdwn b/doc/todo/wishlist:_perform_fsck_remotely.mdwn
new file mode 100644
index 000000000..f2187d912
--- /dev/null
+++ b/doc/todo/wishlist:_perform_fsck_remotely.mdwn
@@ -0,0 +1,39 @@
+Currently, when `fsck`'ing a remote, files are first downloaded to a temporary 
+file locally, decrypted if needed, and finally digested; the temporary file is
+then either thrown away, or quarantined, depending on the value of that digest.
+
+Whereas this approach works with any kind of remote, in the particular case 
+where the user is granted execution rights on the digest command, one could
+avoid cluttering the network and digest the file remotely. I propose the
+addition of a per-remote git option `annex-remote-fsck` to switch between the
+two behaviors.
+
+
+There is an issue with encrypted specialremotes, though. As hinted at 
+[[here|tips/beware_of_SSD_wear_when_doing_fsck_on_large_special_remotes/#comment-70055f166f7eeca976021d24a736b471]],
+since the digest of a ciphertext can't be deduced from that of a plaintext in 
+general one would needs, before sending an encrypted file to such a remote, to
+digest it and store that digest somewhere (together with the cipher's size and
+perhaps other meta-information).
+
+The usual directory structure (`.../.../{backend}-s{size}--{digest}.log`) seems
+perfectly suitable to store these informations. Lines there would look like
+`{timestamp}s {numcopy} {UUID} {remote digest}`. Of course, it implies that
+remote digest commands are trustworthy (are doing the right thing), and that
+the digest output are not tampered by others who have access to the git repo.
+But that's outside the current threat model, I guess.
+
+Actually, since git-annex always includes a MDC in the ciphertexts, we could do
+something clever and even avoid running a digest algorithm. According to the
+[[OpenPGP standard|https://tools.ietf.org/html/rfc4880#section-5.14]] the MDC
+is essentially a SHA-1 hash of the plaintext. I'm still investigating if it's
+even possible, but in theory it would be enough (with non-chained ciphers at
+least) to download a few bytes from the encrypted remote, decrypt those bytes
+to retrieve the hash, and compare that hash with the known value. Of course
+there is a downside here, namely that files tampered anywhere but on the MDC
+packets would not be detected by `fsck` (but gpg will warn when decrypting the
+file).
+
+
+My 2 cents :-) Is there something I missed? I suppose there was a reason to 
+perform `fsck` locally at the first place...
author	Joey Hess <joey@kitenet.net>	2013-08-21 15:44:08 -0400
committer	Joey Hess <joey@kitenet.net>	2013-08-21 15:44:08 -0400
commit	eff764be08164d27307fa1161dfbedc97657afb7 (patch)
tree	d26954c315ff7919bb158045cace5e36546a8d50
parent	00b125dff83117138bb32c4146c032dd70160ab6 (diff)
parent	3bab9ee055251d27f0a8e9b68c30b0c4136c20ea (diff)