summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--doc/bugs/Strange_case_of_data_loss__44___possibly_linked_to_git-annex_with_encrypted_rsync_remote.mdwn55
1 files changed, 55 insertions, 0 deletions
diff --git a/doc/bugs/Strange_case_of_data_loss__44___possibly_linked_to_git-annex_with_encrypted_rsync_remote.mdwn b/doc/bugs/Strange_case_of_data_loss__44___possibly_linked_to_git-annex_with_encrypted_rsync_remote.mdwn
new file mode 100644
index 000000000..be0060c58
--- /dev/null
+++ b/doc/bugs/Strange_case_of_data_loss__44___possibly_linked_to_git-annex_with_encrypted_rsync_remote.mdwn
@@ -0,0 +1,55 @@
+This is not really a proper bug report, but I thought I should post this here
+in case someone can find any sane, non-supernatural reason for a strange case
+of data loss I have experienced with git-annex.
+
+Some time ago I cloned a bunch of git-annex repos from an external drive (let's
+call it disk1) to a new computer (computer3). On one of my repos git-annex
+marked a bunch of files corrupt and moved them to .git/annex/bad. Oops, I
+thought, I must have a failing disk. Luckily I had offsite backups -- no less
+than two other external hard disks (disk2-3), each having a full copy of the
+repo in question. However, **both of these** had the same, corrupt files. The
+files have the correct size, but are filled with zeroes. Other files in the
+repo are fine, and so are other repos.
+
+I have been trying to wrap my head around this but I can't think of any reason
+how this could occur. However the files have gotten corrupted in the first
+place, the corruption should have been picked up when copying the content to
+the external drives disk2 and disk3, right? I have to rule out NSA/MIB/aliens
+from messing with me because these files are not that valuable or sensitive.
+
+The files in question were added to git-annex back in 2012, so the trail is
+cold on this one. Naturally, I have no idea on how to reproduce this, nor can I
+reliably say that git-annex is to blame. I can gather some hints though. The
+files were all added on the same commit in 2012, but not all files from that
+commit are corrupted. The corrupted files have consecutive file names. The
+files were never modified since (except for the corruption), and the content
+*may* have been copied via an encrypted rsync transfer repository. I have
+always used git-annex on Arch Linux and in indirect more. The files used the
+SHA-1 backend.
+
+All these files have a similar tracking log that looks something like this
+(uuids replaced with symbolic names):
+
+ 1356690700.542152s 1 computer1 <- first added
+ 1356691074.253815s 1 disk1 <- copied to disk1
+ 1356719321.145126s 1 rsync <- copied to rsync repo
+ 1358070999.435676s 1 rsync <- copied to rsync repo (again?)
+ 1362166895.310332s 1 disk2 <- copied to disk2
+ 1362906850.555869s 1 computer2 (dead) <- copied to another computer
+ 1364926664.362195s 0 computer1 <- dropped from computer1 as enough copies in disks
+ 1374412057.409496s 0 computer2 (dead) <- dropped from computer2, now dead
+ 1445691595.764108s 1 disk3 <- copied to disk3
+ 1445770764.165792s 0 rsync <- dropped from rsync repo to save space
+ 1482077052.217353646s 0 disk1 <- first noticed as corrupted on disk1
+ 1482741278.318274404s 0 disk3 <- WTF, also corrupted on disk3
+ 1482926246.268440532s 0 disk2 <- double-WTF, also corrupted on disk2
+
+The only thing that strikes odd to me is the double entry with the rsync
+remote. The non-corrupted files from the same commit do not seem to have such a
+double entry.
+
+So my main question is, has there ever been a bug in git-annex that could have
+caused this behavior? Or is there any other realistic explanation for this? In
+case this is an existing bug, is there any other evidence I can gather?
+Needless to say, the lesson here is to run `git annex fsck` regularly even if
+you have offsite backups...