diff options
author | Joey Hess <joey@kitenet.net> | 2012-06-13 19:30:13 -0400 |
---|---|---|
committer | Joey Hess <joey@kitenet.net> | 2012-06-13 19:30:13 -0400 |
commit | b1a4d558360cfa9b650363f897f86dcf162c42ee (patch) | |
tree | 6de3e6df7114ae72389a296094c48e5f8e4ce2e6 | |
parent | 8919c2e4da5e17b8127d738ded733a1a01996194 (diff) | |
parent | a40dc2d390e5b2ba09614477737845aad6b6bb1c (diff) |
Merge branch 'master' into watch
-rw-r--r-- | doc/design/assistant/blog/day_8__speed.mdwn | 67 | ||||
-rw-r--r-- | doc/design/assistant/inotify.mdwn | 2 | ||||
-rw-r--r-- | doc/design/assistant/progressbars/comment_1_3ea263b1f334e8e38e14f00a96202988._comment | 8 |
3 files changed, 77 insertions, 0 deletions
diff --git a/doc/design/assistant/blog/day_8__speed.mdwn b/doc/design/assistant/blog/day_8__speed.mdwn new file mode 100644 index 000000000..56b1e9c07 --- /dev/null +++ b/doc/design/assistant/blog/day_8__speed.mdwn @@ -0,0 +1,67 @@ +Since last post, I've worked on speeding up `git annex watch`'s startup time +in a large repository. + +The problem was that its initial scan was naively staging every symlink in +the repository, even though most of them are, presumably, staged correctly +already. This was done in case the user copied or moved some symlinks +around while `git annex watch` was not running -- we want to notice and +commit such changes at startup. + +Since I already had the `stat` info for the symlink, it can look at the +`ctime` to see if the symlink was made recently, and only stage it if so. +This sped up startup in my big repo from longer than I cared to wait (10+ +minutes, or half an hour while profiling) to a minute or so. Of course, +inotify events are already serviced during startup, so making it scan +quickly is really only important so people don't think it's a resource hog. +First impressions are important. :) + +But what does "made recently" mean exactly? Well, my answer is possibly +overengineered, but most of it is really groundwork for things I'll need +later anyway. I added a new data structure for tracking the status of the +daemon, which is periodically written to disk by another thread (thread #6!) +to `.git/annex/daemon.status` Currently it looks like this; I anticipate +adding lots more info as I move into the [[syncing]] stage: + + lastRunning:1339610482.47928s + scanComplete:True + +So, only symlinks made after the daemon was last running need to be +expensively staged on startup. Although, as RichiH pointed out, +this fails if the clock is changed. But I have been planning to have a +cleanup thread anyway, that will handle this, and other +potential problems, so I think that's ok. + +Stracing its startup scan, it's fairly tight now. There are some repeated +`getcwd` syscalls that could be optimised out for a minor speedup. + +---- + +Added the sanity check thread. Thread #7! It currently only does one sanity +check per day, but the sanity check is a fairly lightweight job, +so I may make it run more frequently. OTOH, it may never ever find a +problem, so once per day seems a good compromise. + +Currently it's only checking that all files in the tree are properly staged +in git. I might make it `git annex fsck` later, but fscking the whole tree +once per day is a bit much. Perhaps it should only fsck a few files per +day? TBD + +Currently any problems found in the sanity check are just fixed and logged. +It would be good to do something about getting problems that might indicate +bugs fed back to me, in a privacy-respecting way. TBD + +---- + +I also refactored the code, which was getting far too large to all be in +one module. + +I have been thinking about renaming `git annex watch` to `git annex assistant`, +but I think I'll leave the command name as-is. Some users might +want a simple watcher and stager, without the assistant's other features +like syncing and the webapp. So the next stage of the +[[roadmap|design/assistant]] will be a different command that also runs +`watch`. + +At this point, I feel I'm done with the first phase of [[inotify]]. +It has a couple known bugs, but it's ready for brave beta testers to try. +I trust it enough to be running it on my live data. diff --git a/doc/design/assistant/inotify.mdwn b/doc/design/assistant/inotify.mdwn index 8f0aebcb1..c2a25673e 100644 --- a/doc/design/assistant/inotify.mdwn +++ b/doc/design/assistant/inotify.mdwn @@ -34,6 +34,8 @@ There is a `watch` branch in git that adds the command. (or merged, etc), it will be converted into an annexed file. See [[blog/day_7__bugfixes]] +* When you `git annex unlock` a file, it will immediately be re-locked. + ## todo - Support OSes other than Linux; it only uses inotify currently. diff --git a/doc/design/assistant/progressbars/comment_1_3ea263b1f334e8e38e14f00a96202988._comment b/doc/design/assistant/progressbars/comment_1_3ea263b1f334e8e38e14f00a96202988._comment new file mode 100644 index 000000000..4a011f61b --- /dev/null +++ b/doc/design/assistant/progressbars/comment_1_3ea263b1f334e8e38e14f00a96202988._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://abhidg.myopenid.com/" + ip="129.67.132.87" + subject="librsync" + date="2012-06-13T02:14:29Z" + content=""" +There's librsync which might support reporting the progress through its API, but it seems to be in beta. +"""]] |