summaryrefslogtreecommitdiff
path: root/doc/design/assistant/blog/day_5__committing.mdwn
blob: 7d6b52199d3d3cc8d3154d70849fa0161dda8ff0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
After a few days otherwise engaged, back to work today.

My focus was on adding the committing thread mentioned in [[day_4__speed]].
I got rather further than expected!

First, I implemented a really dumb thread, that woke up once per second,
checked if any changes had been made, and committed them. Of course, this
rather sucked. In the middle of a large operation like untarring a tarball,
or `rm -r` of a large directory tree, it made lots of commits and made
things slow and ugly. This was not unexpected.

So next, I added some smarts to it. First, I wanted to stop it waking up
every second when there was nothing to do, and instead blocking wait on a
change occuring. Secondly, I wanted it to know when past changes happened,
so it could detect batch mode scenarios, and avoid committing too
frequently. 

I played around with combinations of various Haskell thread communications
tools to get that information to the committer thread: `MVar`, `Chan`,
`QSem`, `QSemN`. Eventually, I realized all I needed was a simple channel
through which the timestamps of changes could be sent. However, `Chan`
wasn't quite suitable, and I had to add a dependency on 
[Software Transactional Memory](http://en.wikipedia.org/wiki/Software_Transactional_Memory),
and use a `TChan`. Now I'm cooking with gas!

With that data channel available to the committer thread, it quickly got
some very nice smart behavior. Playing around with it, I find it commits
*instantly* when I'm making some random change that I'd want the
git-annex assistant to sync out instantly; and that its batch job detection
works pretty well too.

There's surely room for improvement, and I made this part of the code
be an entirely pure function, so it's really easy to change the strategy.
This part of the committer thread is so nice and clean, that here's the
current code, for your viewing pleasure:

[[!format haskell """
{- Decide if now is a good time to make a commit.
 - Note that the list of change times has an undefined order.
 -
 - Current strategy: If there have been 10 commits within the past second,
 - a batch activity is taking place, so wait for later.
 -}
shouldCommit :: UTCTime -> [UTCTime] -> Bool
shouldCommit now changetimes
       | len == 0 = False
       | len > 4096 = True -- avoid bloating queue too much
       | length (filter thisSecond changetimes) < 10 = True
       | otherwise = False -- batch activity
       where
               len = length changetimes
               thisSecond t = now `diffUTCTime` t <= 1
"""]]

Still some polishing to do to eliminate minor innefficiencies and deal
with more races, but this part of the git-annex assistant is now very usable,
and will be going out to my beta testers soon!