summaryrefslogtreecommitdiff
path: root/doc/devblog
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2015-04-01 17:53:25 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2015-04-01 17:53:25 -0400
commit145588df5ada2b1538789fc5b63eefe6aab912bb (patch)
tree8bfd82aa6536f7af689f20a47f2ec6f1031938c0 /doc/devblog
parent3dda636033123f6e1d9fa45a1971b9daf6ebcf54 (diff)
devblog
Diffstat (limited to 'doc/devblog')
-rw-r--r--doc/devblog/day_270__distributed_fsck.mdwn25
1 files changed, 25 insertions, 0 deletions
diff --git a/doc/devblog/day_270__distributed_fsck.mdwn b/doc/devblog/day_270__distributed_fsck.mdwn
new file mode 100644
index 000000000..76227442d
--- /dev/null
+++ b/doc/devblog/day_270__distributed_fsck.mdwn
@@ -0,0 +1,25 @@
+Added two options to `git annex fsck` that allow for a form of distributed
+fsck. This is useful in situations where repositiories cannot be trusted to
+continue to exist, and cannot be checked directly, but you'd still like to
+keep track of their status. [[design/iabackup]] is one use case for this.
+
+By running a periodic fsck with the --distributed option,
+the repositories can verify that they still exist and that the
+information about their contents is still accurate. This is done by
+doing an extra update of the location log each time a file is verified by
+fsck to still be in the repository.
+
+The other option looks like --expire="30d somerepo:60d". It checks that
+each specified repository has recorded a distributed fsck within the specified
+time period. If not, the repository is dropped from the location tracking
+log. Of course it can always update that later if it's really still around.
+
+Distributed fsck is not the default because those extra location log updates
+increase the size of the git-annex branch. I did one thing to keep the size
+increase small: An identical line is logged to for each key, including the
+timestamp, so git's delta compression will work as well as is possible. But,
+there's still commit and tree update overhead.
+
+Probably doesn't make sense to run distributed fscks too often for that and
+other reasons. If the git-annex branch does get too large, there's always
+`git annex forget` ...