diff options
-rw-r--r-- | doc/bugs/Strange_case_of_data_loss__44___possibly_linked_to_git-annex_with_encrypted_rsync_remote.mdwn | 55 |
1 files changed, 55 insertions, 0 deletions
diff --git a/doc/bugs/Strange_case_of_data_loss__44___possibly_linked_to_git-annex_with_encrypted_rsync_remote.mdwn b/doc/bugs/Strange_case_of_data_loss__44___possibly_linked_to_git-annex_with_encrypted_rsync_remote.mdwn new file mode 100644 index 000000000..be0060c58 --- /dev/null +++ b/doc/bugs/Strange_case_of_data_loss__44___possibly_linked_to_git-annex_with_encrypted_rsync_remote.mdwn @@ -0,0 +1,55 @@ +This is not really a proper bug report, but I thought I should post this here +in case someone can find any sane, non-supernatural reason for a strange case +of data loss I have experienced with git-annex. + +Some time ago I cloned a bunch of git-annex repos from an external drive (let's +call it disk1) to a new computer (computer3). On one of my repos git-annex +marked a bunch of files corrupt and moved them to .git/annex/bad. Oops, I +thought, I must have a failing disk. Luckily I had offsite backups -- no less +than two other external hard disks (disk2-3), each having a full copy of the +repo in question. However, **both of these** had the same, corrupt files. The +files have the correct size, but are filled with zeroes. Other files in the +repo are fine, and so are other repos. + +I have been trying to wrap my head around this but I can't think of any reason +how this could occur. However the files have gotten corrupted in the first +place, the corruption should have been picked up when copying the content to +the external drives disk2 and disk3, right? I have to rule out NSA/MIB/aliens +from messing with me because these files are not that valuable or sensitive. + +The files in question were added to git-annex back in 2012, so the trail is +cold on this one. Naturally, I have no idea on how to reproduce this, nor can I +reliably say that git-annex is to blame. I can gather some hints though. The +files were all added on the same commit in 2012, but not all files from that +commit are corrupted. The corrupted files have consecutive file names. The +files were never modified since (except for the corruption), and the content +*may* have been copied via an encrypted rsync transfer repository. I have +always used git-annex on Arch Linux and in indirect more. The files used the +SHA-1 backend. + +All these files have a similar tracking log that looks something like this +(uuids replaced with symbolic names): + + 1356690700.542152s 1 computer1 <- first added + 1356691074.253815s 1 disk1 <- copied to disk1 + 1356719321.145126s 1 rsync <- copied to rsync repo + 1358070999.435676s 1 rsync <- copied to rsync repo (again?) + 1362166895.310332s 1 disk2 <- copied to disk2 + 1362906850.555869s 1 computer2 (dead) <- copied to another computer + 1364926664.362195s 0 computer1 <- dropped from computer1 as enough copies in disks + 1374412057.409496s 0 computer2 (dead) <- dropped from computer2, now dead + 1445691595.764108s 1 disk3 <- copied to disk3 + 1445770764.165792s 0 rsync <- dropped from rsync repo to save space + 1482077052.217353646s 0 disk1 <- first noticed as corrupted on disk1 + 1482741278.318274404s 0 disk3 <- WTF, also corrupted on disk3 + 1482926246.268440532s 0 disk2 <- double-WTF, also corrupted on disk2 + +The only thing that strikes odd to me is the double entry with the rsync +remote. The non-corrupted files from the same commit do not seem to have such a +double entry. + +So my main question is, has there ever been a bug in git-annex that could have +caused this behavior? Or is there any other realistic explanation for this? In +case this is an existing bug, is there any other evidence I can gather? +Needless to say, the lesson here is to run `git annex fsck` regularly even if +you have offsite backups... |