diff options
author | Joey Hess <joey@kitenet.net> | 2012-06-11 12:13:07 -0400 |
---|---|---|
committer | Joey Hess <joey@kitenet.net> | 2012-06-11 12:13:07 -0400 |
commit | a5a3cd55ac2bab656824e48d29ead8382c583b01 (patch) | |
tree | 1d5503b35e7fdb27bcd2e1e3f8f1020686882125 /doc/design | |
parent | 433ff41496b073c71e465af8b38b2ecafe27d8dd (diff) | |
parent | 5642e189b76070b43a8e24f9f49d36b950f83c8d (diff) |
Merge branch 'master' into watch
Conflicts:
debian/changelog
Diffstat (limited to 'doc/design')
5 files changed, 97 insertions, 7 deletions
diff --git a/doc/design/assistant/blog/day_5__committing.mdwn b/doc/design/assistant/blog/day_5__committing.mdwn new file mode 100644 index 000000000..7d6b52199 --- /dev/null +++ b/doc/design/assistant/blog/day_5__committing.mdwn @@ -0,0 +1,57 @@ +After a few days otherwise engaged, back to work today. + +My focus was on adding the committing thread mentioned in [[day_4__speed]]. +I got rather further than expected! + +First, I implemented a really dumb thread, that woke up once per second, +checked if any changes had been made, and committed them. Of course, this +rather sucked. In the middle of a large operation like untarring a tarball, +or `rm -r` of a large directory tree, it made lots of commits and made +things slow and ugly. This was not unexpected. + +So next, I added some smarts to it. First, I wanted to stop it waking up +every second when there was nothing to do, and instead blocking wait on a +change occuring. Secondly, I wanted it to know when past changes happened, +so it could detect batch mode scenarios, and avoid committing too +frequently. + +I played around with combinations of various Haskell thread communications +tools to get that information to the committer thread: `MVar`, `Chan`, +`QSem`, `QSemN`. Eventually, I realized all I needed was a simple channel +through which the timestamps of changes could be sent. However, `Chan` +wasn't quite suitable, and I had to add a dependency on +[Software Transactional Memory](http://en.wikipedia.org/wiki/Software_Transactional_Memory), +and use a `TChan`. Now I'm cooking with gas! + +With that data channel available to the committer thread, it quickly got +some very nice smart behavior. Playing around with it, I find it commits +*instantly* when I'm making some random change that I'd want the +git-annex assistant to sync out instantly; and that its batch job detection +works pretty well too. + +There's surely room for improvement, and I made this part of the code +be an entirely pure function, so it's really easy to change the strategy. +This part of the committer thread is so nice and clean, that here's the +current code, for your viewing pleasure: + +[[!format haskell """ +{- Decide if now is a good time to make a commit. + - Note that the list of change times has an undefined order. + - + - Current strategy: If there have been 10 commits within the past second, + - a batch activity is taking place, so wait for later. + -} +shouldCommit :: UTCTime -> [UTCTime] -> Bool +shouldCommit now changetimes + | len == 0 = False + | len > 4096 = True -- avoid bloating queue too much + | length (filter thisSecond changetimes) < 10 = True + | otherwise = False -- batch activity + where + len = length changetimes + thisSecond t = now `diffUTCTime` t <= 1 +"""]] + +Still some polishing to do to eliminate minor innefficiencies and deal +with more races, but this part of the git-annex assistant is now very usable, +and will be going out to my beta testers soon! diff --git a/doc/design/assistant/cloud/comment_1_4997778abc171999499487b71b31c9ba._comment b/doc/design/assistant/cloud/comment_1_4997778abc171999499487b71b31c9ba._comment new file mode 100644 index 000000000..1a01afaa3 --- /dev/null +++ b/doc/design/assistant/cloud/comment_1_4997778abc171999499487b71b31c9ba._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawkq0-zRhubO6kR9f85-5kALszIzxIokTUw" + nickname="James" + subject="Cloud Service Limitations" + date="2012-06-11T02:15:04Z" + content=""" +Hey Joey! + +I'm not very tech savvy, but here is my question. +I think for all cloud service providers, there is an upload limitation on how big one file may be. +For example, I can't upload a file bigger than 100 MB on box.net. +Does this affect git-annex at all? Will git-annex automatically split the file depending on the cloud provider or will I have to create small RAR archives of one large file to upload them? + +Thanks! +James +"""]] diff --git a/doc/design/assistant/cloud/comment_2_08da8bc74a4845e354dca99184cffd70._comment b/doc/design/assistant/cloud/comment_2_08da8bc74a4845e354dca99184cffd70._comment new file mode 100644 index 000000000..a9b377ea5 --- /dev/null +++ b/doc/design/assistant/cloud/comment_2_08da8bc74a4845e354dca99184cffd70._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="4.153.8.126" + subject="re: cloud" + date="2012-06-11T04:48:08Z" + content=""" +Yes, git-annex has to split files for certian providers. I already added support for this as part of my first pass at supporting box.com, see [[tips/using_box.com_as_a_special_remote]]. +"""]] diff --git a/doc/design/assistant/inotify.mdwn b/doc/design/assistant/inotify.mdwn index 7cdde33ac..9fe6938c4 100644 --- a/doc/design/assistant/inotify.mdwn +++ b/doc/design/assistant/inotify.mdwn @@ -23,10 +23,11 @@ really useful, it needs to: is exceeded. This can be tuned by root, so help the user fix it. **done** - periodically auto-commit staged changes (avoid autocommitting when - lots of changes are coming in) -- tunable delays before adding new files, etc -- coleasce related add/rm events for speed and less disk IO + lots of changes are coming in) **done** +- coleasce related add/rm events for speed and less disk IO **done** - don't annex `.gitignore` and `.gitattributes` files **done** +- run as a daemon **done** +- tunable delays before adding new files, etc - configurable option to only annex files meeting certian size or filename criteria - option to check files not meeting annex criteria into git directly @@ -107,7 +108,3 @@ Many races need to be dealt with by this code. Here are some of them. Not a problem; The removal event removes the old file from the index, and the add event adds the new one. - -* At startup, `git add --update` is run, to notice deleted files. - Then inotify starts up. Files deleted in between won't have their - removals staged. diff --git a/doc/design/assistant/windows.mdwn b/doc/design/assistant/windows.mdwn index da669ad82..26ff2c1c6 100644 --- a/doc/design/assistant/windows.mdwn +++ b/doc/design/assistant/windows.mdwn @@ -22,3 +22,15 @@ Or I could try to use Cygwin. ## Deeper system integration [NTFS Reparse Points](http://msdn.microsoft.com/en-us/library/aa365503%28v=VS.85%29.aspx) allow a program to define how the OS will interpret a file or directory in arbitrary ways. This requires writing a file system filter. + +## Developement environment + +Someone wrote in to say: + +> For Windows Development you can easily qualify +> for Bizspark - http://www.microsoft.com/bizspark/ +> +> This will get you 100% free Windows OS licenses and +> Dev tools, plus a free Azure account for cloud testing. +> (You can also now deploy Linux VMs to Azure as well) +> No money required at all. |