diff options
-rw-r--r-- | doc/todo/wishlist:_perform_fsck_remotely.mdwn | 39 |
1 files changed, 39 insertions, 0 deletions
diff --git a/doc/todo/wishlist:_perform_fsck_remotely.mdwn b/doc/todo/wishlist:_perform_fsck_remotely.mdwn new file mode 100644 index 000000000..f2187d912 --- /dev/null +++ b/doc/todo/wishlist:_perform_fsck_remotely.mdwn @@ -0,0 +1,39 @@ +Currently, when `fsck`'ing a remote, files are first downloaded to a temporary +file locally, decrypted if needed, and finally digested; the temporary file is +then either thrown away, or quarantined, depending on the value of that digest. + +Whereas this approach works with any kind of remote, in the particular case +where the user is granted execution rights on the digest command, one could +avoid cluttering the network and digest the file remotely. I propose the +addition of a per-remote git option `annex-remote-fsck` to switch between the +two behaviors. + + +There is an issue with encrypted specialremotes, though. As hinted at +[[here|tips/beware_of_SSD_wear_when_doing_fsck_on_large_special_remotes/#comment-70055f166f7eeca976021d24a736b471]], +since the digest of a ciphertext can't be deduced from that of a plaintext in +general one would needs, before sending an encrypted file to such a remote, to +digest it and store that digest somewhere (together with the cipher's size and +perhaps other meta-information). + +The usual directory structure (`.../.../{backend}-s{size}--{digest}.log`) seems +perfectly suitable to store these informations. Lines there would look like +`{timestamp}s {numcopy} {UUID} {remote digest}`. Of course, it implies that +remote digest commands are trustworthy (are doing the right thing), and that +the digest output are not tampered by others who have access to the git repo. +But that's outside the current threat model, I guess. + +Actually, since git-annex always includes a MDC in the ciphertexts, we could do +something clever and even avoid running a digest algorithm. According to the +[[OpenPGP standard|https://tools.ietf.org/html/rfc4880#section-5.14]] the MDC +is essentially a SHA-1 hash of the plaintext. I'm still investigating if it's +even possible, but in theory it would be enough (with non-chained ciphers at +least) to download a few bytes from the encrypted remote, decrypt those bytes +to retrieve the hash, and compare that hash with the known value. Of course +there is a downside here, namely that files tampered anywhere but on the MDC +packets would not be detected by `fsck` (but gpg will warn when decrypting the +file). + + +My 2 cents :-) Is there something I missed? I suppose there was a reason to +perform `fsck` locally at the first place... |