doc/devblog/day_336__pid_locks.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

Been working today on getting git-annex to fall back from nice posix fcntl
locks to pid locks when the former are not supported. There will be an
`annex.pidlock` to control this. Mostly useful, I think for networked file
systems like NFS and Lustre. While these *do* support posix locks, I
guess it can be hard sometimes to get some big server configured
appropriately, especially when you don't admin it and just want to use
git-annex there.

Of course, the fun part about pid locks is that it can be pretty hard to
tell if one is stale or not. Especialy when using a networked filesystem,
because then the pid in question can be running on a different computer.

Even if you do figure out that a pid lock is stale, how do you then
take over a stale pid lock, without racing with anther process that
also wants to take it over? This was the truely tricky question of the
day.

I have a possibly slightly novel approach to solve that: 
Put a more modern lock file someplace else (eg, /dev/shm)
and use that lock file to lock the pid lock file. Then you can tell if 
a local pid lock file is stale quickly locally, and take it over safely.
Of course, if the pid is not locked by a local process, this still
has to fall back to the inevitable retry-and-timeout-and-fail.

I hope the result will work pretty well, although git-annex will not
support as fine-grained concurrency when using pid locks. Will find out
tomorrow when I run today's code! ;)