summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2012-06-10 16:33:42 -0400
committerGravatar Joey Hess <joey@kitenet.net>2012-06-10 16:33:42 -0400
commita0e29b214f9cb0d3772f9d97d152e2dd9e40adf5 (patch)
tree911750bfada290bc26154bacd24435332191fe3b
parent3bb58afd594c7208a34749202a858644498acb6f (diff)
blog for the day
-rw-r--r--doc/design/assistant/blog/day_5__committing.mdwn57
1 files changed, 57 insertions, 0 deletions
diff --git a/doc/design/assistant/blog/day_5__committing.mdwn b/doc/design/assistant/blog/day_5__committing.mdwn
new file mode 100644
index 000000000..3840138c6
--- /dev/null
+++ b/doc/design/assistant/blog/day_5__committing.mdwn
@@ -0,0 +1,57 @@
+After a few days otherwise engaged, back to work today.
+
+My focus was on adding the committing thread mentioned in [[day_4__speed]].
+I got rather further than expected!
+
+First, I implemented a really dumb thread, that woke up once per second,
+checked if any changes had been made, and committed them. Of course, this
+rather sucked. In the middle of a large operation like untarring a tarball,
+or `rm -r` of a large directory tree, it made lots of commits and made
+things slow and ugly. This was not unexpected.
+
+So next, I added some smarts to it. First, I wanted to stop it waking up
+every second when there was nothing to do, and instead blocking wait on a
+change occuring. Secondly, I wanted it to know when past changes happened,
+so it could detect batch mode scenarios, and avoid committing too
+frequently.
+
+I played around with combinations of various Haskell thread communications
+tools to get that information to the committer thread: `MVar`, `Chan`,
+`QSem`, `QSemN`. Eventually, I realized all I needed was a simple channel
+through which the timestamps of changes could be sent. However, `Chan`
+wasn't quite suitable, and I had to add a dependency on
+[Software Transactional Memory](http://en.wikipedia.org/wiki/Software_Transactional_Memory),
+and use a `TChan`. Now I'm cooking with gas!
+
+With that data channel available to the committer thread, it quickly got
+some very nice smart behavior. Playing around with it, I find it commits
+*instantly* when I'm making some random change that I'd want the
+git-annex assistant to sync out instantly; and that its batch job detection
+works pretty well too.
+
+There's surely room for improvement, and I made this part of the code
+be an entirely pure function, so it's really easy to change the strategy.
+This part of the committer thread is so nice and clean, that here's the
+current code, for your viewing pleasure:
+
+[[format haskell """
+{- Decide if now is a good time to make a commit.
+ - Note that the list of change times has an undefined order.
+ -
+ - Current strategy: If there have been 10 commits within the past second,
+ - a batch activity is taking place, so wait for later.
+ -}
+shouldCommit :: UTCTime -> [UTCTime] -> Bool
+shouldCommit now changetimes
+ | len == 0 = False
+ | len > 4096 = True -- avoid bloating queue too much
+ | length (filter thisSecond changetimes) < 10 = True
+ | otherwise = False -- batch activity
+ where
+ len = length changetimes
+ thisSecond t = now `diffUTCTime` t <= 1
+"""]]
+
+Still some polishing to do to eliminate minor innefficiencies and deal
+with more races, but this part of the git-annex assistant is now very usable,
+and will be going out to my beta testers soon!