diff options
6 files changed, 120 insertions, 8 deletions
@@ -1,5 +1,5 @@ PREFIX=/usr -IGNORE=-ignore-package monads-fd +IGNORE=-ignore-package monads-fd -ignore-package monads-tf BASEFLAGS=-Wall $(IGNORE) -outputdir tmp -IUtility -DWITH_S3 GHCFLAGS=-O2 $(BASEFLAGS) diff --git a/doc/design/assistant/blog/day_5__committing.mdwn b/doc/design/assistant/blog/day_5__committing.mdwn new file mode 100644 index 000000000..7d6b52199 --- /dev/null +++ b/doc/design/assistant/blog/day_5__committing.mdwn @@ -0,0 +1,57 @@ +After a few days otherwise engaged, back to work today. + +My focus was on adding the committing thread mentioned in [[day_4__speed]]. +I got rather further than expected! + +First, I implemented a really dumb thread, that woke up once per second, +checked if any changes had been made, and committed them. Of course, this +rather sucked. In the middle of a large operation like untarring a tarball, +or `rm -r` of a large directory tree, it made lots of commits and made +things slow and ugly. This was not unexpected. + +So next, I added some smarts to it. First, I wanted to stop it waking up +every second when there was nothing to do, and instead blocking wait on a +change occuring. Secondly, I wanted it to know when past changes happened, +so it could detect batch mode scenarios, and avoid committing too +frequently. + +I played around with combinations of various Haskell thread communications +tools to get that information to the committer thread: `MVar`, `Chan`, +`QSem`, `QSemN`. Eventually, I realized all I needed was a simple channel +through which the timestamps of changes could be sent. However, `Chan` +wasn't quite suitable, and I had to add a dependency on +[Software Transactional Memory](http://en.wikipedia.org/wiki/Software_Transactional_Memory), +and use a `TChan`. Now I'm cooking with gas! + +With that data channel available to the committer thread, it quickly got +some very nice smart behavior. Playing around with it, I find it commits +*instantly* when I'm making some random change that I'd want the +git-annex assistant to sync out instantly; and that its batch job detection +works pretty well too. + +There's surely room for improvement, and I made this part of the code +be an entirely pure function, so it's really easy to change the strategy. +This part of the committer thread is so nice and clean, that here's the +current code, for your viewing pleasure: + +[[!format haskell """ +{- Decide if now is a good time to make a commit. + - Note that the list of change times has an undefined order. + - + - Current strategy: If there have been 10 commits within the past second, + - a batch activity is taking place, so wait for later. + -} +shouldCommit :: UTCTime -> [UTCTime] -> Bool +shouldCommit now changetimes + | len == 0 = False + | len > 4096 = True -- avoid bloating queue too much + | length (filter thisSecond changetimes) < 10 = True + | otherwise = False -- batch activity + where + len = length changetimes + thisSecond t = now `diffUTCTime` t <= 1 +"""]] + +Still some polishing to do to eliminate minor innefficiencies and deal +with more races, but this part of the git-annex assistant is now very usable, +and will be going out to my beta testers soon! diff --git a/doc/design/assistant/cloud/comment_1_4997778abc171999499487b71b31c9ba._comment b/doc/design/assistant/cloud/comment_1_4997778abc171999499487b71b31c9ba._comment new file mode 100644 index 000000000..1a01afaa3 --- /dev/null +++ b/doc/design/assistant/cloud/comment_1_4997778abc171999499487b71b31c9ba._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawkq0-zRhubO6kR9f85-5kALszIzxIokTUw" + nickname="James" + subject="Cloud Service Limitations" + date="2012-06-11T02:15:04Z" + content=""" +Hey Joey! + +I'm not very tech savvy, but here is my question. +I think for all cloud service providers, there is an upload limitation on how big one file may be. +For example, I can't upload a file bigger than 100 MB on box.net. +Does this affect git-annex at all? Will git-annex automatically split the file depending on the cloud provider or will I have to create small RAR archives of one large file to upload them? + +Thanks! +James +"""]] diff --git a/doc/design/assistant/cloud/comment_2_08da8bc74a4845e354dca99184cffd70._comment b/doc/design/assistant/cloud/comment_2_08da8bc74a4845e354dca99184cffd70._comment new file mode 100644 index 000000000..a9b377ea5 --- /dev/null +++ b/doc/design/assistant/cloud/comment_2_08da8bc74a4845e354dca99184cffd70._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="4.153.8.126" + subject="re: cloud" + date="2012-06-11T04:48:08Z" + content=""" +Yes, git-annex has to split files for certian providers. I already added support for this as part of my first pass at supporting box.com, see [[tips/using_box.com_as_a_special_remote]]. +"""]] diff --git a/doc/design/assistant/inotify.mdwn b/doc/design/assistant/inotify.mdwn index 7cdde33ac..9fe6938c4 100644 --- a/doc/design/assistant/inotify.mdwn +++ b/doc/design/assistant/inotify.mdwn @@ -23,10 +23,11 @@ really useful, it needs to: is exceeded. This can be tuned by root, so help the user fix it. **done** - periodically auto-commit staged changes (avoid autocommitting when - lots of changes are coming in) -- tunable delays before adding new files, etc -- coleasce related add/rm events for speed and less disk IO + lots of changes are coming in) **done** +- coleasce related add/rm events for speed and less disk IO **done** - don't annex `.gitignore` and `.gitattributes` files **done** +- run as a daemon **done** +- tunable delays before adding new files, etc - configurable option to only annex files meeting certian size or filename criteria - option to check files not meeting annex criteria into git directly @@ -107,7 +108,3 @@ Many races need to be dealt with by this code. Here are some of them. Not a problem; The removal event removes the old file from the index, and the add event adds the new one. - -* At startup, `git add --update` is run, to notice deleted files. - Then inotify starts up. Files deleted in between won't have their - removals staged. diff --git a/doc/forum/autobuilders_for_git-annex_to_aid_development.mdwn b/doc/forum/autobuilders_for_git-annex_to_aid_development.mdwn new file mode 100644 index 000000000..8cd370937 --- /dev/null +++ b/doc/forum/autobuilders_for_git-annex_to_aid_development.mdwn @@ -0,0 +1,34 @@ +This is a continuation of the conversation from [[the comments|design/assistant/#comment-77e54e7ebfbd944c370173014b535c91]] section in the design of git-assistant. In summary, I've setup an auto builder which should help [[Joey]] have an easier time developing on git-annex on non-linux/debian platforms. This builder is currently running on OSX 10.7 with the 64bit version of Haskell Platform. + +The builder output can be found at <http://www.sgenomics.org/~jtang/gitbuilder-git-annex-x00-x86_64-apple-darwin10.8.0/>, the CGI on this site does not work as my OSX workstation is pushing the output from another location. + +The builder currently tries to build all branches except + +* debian-stable +* pristine-tar +* setup + +It also does not build any of the tags as well, Joey had suggested to ignore the bpo named tags, but for now it's easier for me to not build any tags. To continue on this discussion, if anyone wants to setup a gitbuilder instance, here is the build.sh script that I am using. + +<pre> +#!/bin/bash -x + +# Macports +export PATH=/opt/local/bin:$PATH + +# Haskell userland +export PATH=$PATH:$HOME/.cabal/bin + +# Macports gnu +export PATH=/opt/local/libexec/gnubin:$PATH + +make || exit 3 + +make -q test +if [ "$?" = 1 ]; then + # run "make test", but give it a time limit in case a test gets stuck + ../maxtime 1800 make test || exit 4 +fi +</pre> + +It's also using the branches-local script for sorting and prioritising the branches to build, this branches-local script can be found at the [autobuild-ceph](https://github.com/ceph/autobuild-ceph/blob/master/branches-local) repository. If there are other people interested in setting up their own instances of gitbuilder for git-annex, please let me know and I will setup an aggregator page to collect status of the builds. The builder runs and updates the webpage every 30mins. |