summaryrefslogtreecommitdiff
path: root/doc/todo/cheaper_global_fsck.mdwn
blob: 70d1380a2dd4678b4e6e978928e90ba1011e2ffa (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Global fsck updates all location log entries for a repo. This wastes disk
space.

I realized now that it can be implemented w/o such waste. Probably cheaply
enough to be the default!

What we need is a new log file, call it fscktimes.log.
This records the time of the last fsck of each repo.

`git annex fsck --expire` no longer needs to look at the location log at
all. It can just check the repo's fscktimes.log entry. If the entry is
recent enough, we know that the repo has fscked recently, and its location
log is good, and nothing needs to be done. Otherwise, we know that the repo
has stopped fscking, and we simply expire *all* its location logs.

Note that fscktime.log is only used by fsck; it does not impact git-annex
generally or make it slower. And, it's very low overhead to update the one
file. Repos could do a fsck --fast on a daily basis and not grow the
git-annex branch much. Maybe on an hourly basis even.

(BTW, there is some overlap with the fsck.log file that is currently used to
hold the timestamp of the last local fsck. May be able to eliminate that
file too.)

----

It might be worth making the fsck.log record --fast and full fscks
separately so we know the last of each for each repo. This would let
--expire require periodic full fscks and more frequent fast fscks.

----

Hmm, --expire updates all the location logs when it thinks a repo has gone
missing. Why not just mark it dead? Again, this would save a lot of space!
It would complicate recovery if a repo had been offline and came back; it
would need to mark itself as not dead any longer.

> [[done]] --[[Joey]]