diff options
Diffstat (limited to 'doc')
13 files changed, 216 insertions, 16 deletions
diff --git a/doc/bugs/Issue_on_OSX_with_some_system_limits/comment_2_b14e697c211843163285aaa8de5bf4c6._comment b/doc/bugs/Issue_on_OSX_with_some_system_limits/comment_2_b14e697c211843163285aaa8de5bf4c6._comment new file mode 100644 index 000000000..17dcf7634 --- /dev/null +++ b/doc/bugs/Issue_on_OSX_with_some_system_limits/comment_2_b14e697c211843163285aaa8de5bf4c6._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawkSq2FDpK2n66QRUxtqqdbyDuwgbQmUWus" + nickname="Jimmy" + subject="comment 2" + date="2012-06-29T12:02:48Z" + content=""" +Doing, + + sudo sysctl -w kern.maxfilesperproc=400000 + +Somewhat works for me, git-annex watch at least starts up and takes a while to scan the directory, but it's not ideal. Also, creating files seems to work okay, when I remove a file the changes don't seem to get pushed across my other repos, running a sync on the remote repo fixes things. +"""]] diff --git a/doc/bugs/watcher_commits_unlocked_files.mdwn b/doc/bugs/watcher_commits_unlocked_files.mdwn new file mode 100644 index 000000000..ef64921f1 --- /dev/null +++ b/doc/bugs/watcher_commits_unlocked_files.mdwn @@ -0,0 +1,28 @@ +When having "git annex watch" running, unlocking files causes the watcher +to immediately lock/commit them. + +---- + +Possible approaches: + +* The watcher could detect unlocked files by checking if newly added files + are a typechange of a file already in git. But this would add git overhead + to every file add. +* `git annex unlock` could add some type of flag file, which the assistant + could check. This would work fine, for users who want to use `git annex + unlock` with the assistant. That's probably not simple enough for most + users, though. +* There could be a UI in the assistant to pick a file and unlock it. + The assistant would have its own list of files it knows are unlocked. + But I'm trying to avoid mandatory UI to use the assistant. +* Perhaps instead, have a directory, like "edit". The assistant could notice + when files move into this special directory, and automatically unlock them. + Then when they're moved out, automatically commit them. +* Alternatively, files that are moved out of the repository entirely could be + automatically unlocked, and then when they're moved back in, it would + automatically do the right thing. This may be worth implementing in + combination with the "edit" directory, as different use cases would work + better with one or the other. However, I don't currently get inotify + events when files are moved out of the repository (well, I do, but it + just says "file moved", with no forwarding address, so I don't know + how to find the file to unlock it. diff --git a/doc/bugs/watcher_commits_unlocked_files/comment_1_f70e1912fde0eee59e208307df06b503._comment b/doc/bugs/watcher_commits_unlocked_files/comment_1_f70e1912fde0eee59e208307df06b503._comment new file mode 100644 index 000000000..a06b8fe82 --- /dev/null +++ b/doc/bugs/watcher_commits_unlocked_files/comment_1_f70e1912fde0eee59e208307df06b503._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawkSq2FDpK2n66QRUxtqqdbyDuwgbQmUWus" + nickname="Jimmy" + subject="comment 1" + date="2012-06-28T13:39:18Z" + content=""" +That is a known problem/bug which is listed at [[design/assistant/inotify]] +"""]] diff --git a/doc/design/assistant/blog/day_20__data_transfer_design.mdwn b/doc/design/assistant/blog/day_20__data_transfer_design.mdwn new file mode 100644 index 000000000..4f47ae63c --- /dev/null +++ b/doc/design/assistant/blog/day_20__data_transfer_design.mdwn @@ -0,0 +1,22 @@ +Today is a planning day. I have only a few days left before I'm off to +Nicaragua for [DebConf](http://debconf12.debconf.org/), where I'll only +have smaller chunks of time without interruptions. So it's important to get +some well-defined smallish chunks designed that I can work on later. See +bulleted action items below (now moved to [[syncing]]. Each +should be around 1-2 hours unless it turns out to be 8 hours... :) + +First, worked on writing down a design, and some data types, for data transfer +tracking (see [[syncing]] page). Found that writing down these simple data +types before I started slinging code has clarified things a lot for me. + +Most importantly, I realized that I will need to modify `git-annex-shell` +to record on disk what transfers it's doing, so the assistant can get that +information and use it to both avoid redundant transfers (potentially a big +problem!), and later to allow the user to control them using the web app. + +While eventually the user will be able to use the web app to prioritize +transfers, stop and start, throttle, etc, it's important to get the default +behavior right. So I'm thinking about things like how to prioritize uploads +vs downloads, when it's appropriate to have multiple downloads running at +once, etc. + diff --git a/doc/design/assistant/blog/day_21__transfer_tracking.mdwn b/doc/design/assistant/blog/day_21__transfer_tracking.mdwn new file mode 100644 index 000000000..79c0b6438 --- /dev/null +++ b/doc/design/assistant/blog/day_21__transfer_tracking.mdwn @@ -0,0 +1,28 @@ +Worked today on two action items from my last blog post: + +* on-disk transfers in progress information files (read/write/enumerate) +* locking for the files, so redundant transfer races can be detected, + and failed transfers noticed + +That's all done, and used by the `get`, `copy`, and `move` subcommands. + +Also, I made `git-annex status` use that information to display any +file transfers that are currently in progress: + + joey@gnu:~/lib/sound/misc>git annex status + [...] + transfers in progress: + downloading Vic-303.mp3 from leech + +(Webapp, here we come!) + +However... Files being sent or received by `git-annex-shell` don't yet +have this transfer info recorded. The problem is that to do so, +`git-annex-shell` will need to be run with a `--remote=` parameter. But +old versions will of course fail when run with such an unknown parameter. + +This is a problem I last faced in December 2011 when adding the `--uuid=` +parameter. That time I punted and required the remote `git-annex-shell` be +updated to a new enough version to accept it. But as git-annex gets more widely +used and packaged, that's becoming less an option. I need to find a real +solution to this problem. diff --git a/doc/design/assistant/inotify.mdwn b/doc/design/assistant/inotify.mdwn index 47b8c84a3..7b600090a 100644 --- a/doc/design/assistant/inotify.mdwn +++ b/doc/design/assistant/inotify.mdwn @@ -8,13 +8,15 @@ available! * If a file is checked into git as a normal file and gets modified (or merged, etc), it will be converted into an annexed file. - See [[blog/day_7__bugfixes]] + See [[blog/day_7__bugfixes]]. * When you `git annex unlock` a file, it will immediately be re-locked. + See [[bugs/watcher_commits_unlocked_files]]. * Kqueue has to open every directory it watches, so too many directories will run it out of the max number of open files (typically 1024), and fail. I may need to fork off multiple watcher processes to handle this. + See [[bugs/Issue_on_OSX_with_some_system_limits]]. ## beyond Linux @@ -42,6 +44,8 @@ I'd also like to support OSX and if possible the BSDs. * [man page](http://www.freebsd.org/cgi/man.cgi?query=kqueue&apropos=0&sektion=0&format=html) * <https://github.com/gorakhargosh/watchdog/blob/master/src/watchdog/observers/kqueue.py> (good example program) + *kqueue is now supported* + * hfsevents ([haskell bindings](http://hackage.haskell.org/package/hfsevents)) is OSX specific. @@ -71,9 +75,6 @@ I'd also like to support OSX and if possible the BSDs. - honor .gitignore, not adding files it excludes (difficult, probably needs my own .gitignore parser to avoid excessive running of git commands to check for ignored files) -- Possibly, when a directory is moved out of the annex location, - unannex its contents. (Does inotify tell us where the directory moved - to so we can access it?) ## the races @@ -125,6 +126,17 @@ Many races need to be dealt with by this code. Here are some of them. Not a problem; The removal event removes the old file from the index, and the add event adds the new one. +* Symlink appears, but is then deleted before it can be processed. + + Leads to an ugly message, otherwise no problem: + + ./me: readSymbolicLink: does not exist (No such file or directory) + + Here `me` is a file that was in a conflicted merge, which got + removed as part of the resolution. This is probably coming from the watcher + thread, which sees the newly added symlink (created by the git merge), + but finds it deleted (by the conflict resolver) by the time it processes it. + ## done - on startup, add any files that have appeared since last run **done** diff --git a/doc/design/assistant/progressbars.mdwn b/doc/design/assistant/progressbars.mdwn index 2ade05aa5..ee7384274 100644 --- a/doc/design/assistant/progressbars.mdwn +++ b/doc/design/assistant/progressbars.mdwn @@ -9,6 +9,6 @@ To get this info for downloads, git-annex can watch the file as it arrives and use its size. TODO: What about uploads? Will i have to parse rsync's progresss output? -Feed it via a named pipe? Ugh. +Feed it via a named pipe? Ugh. Check into librsync. This is one of those potentially hidden but time consuming problems. diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 50e6fb4f1..5476b56f1 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -1,6 +1,37 @@ Once files are added (or removed or moved), need to send those changes to all the other git clones, at both the git level and the key/value level. +## action items + +* on-disk transfers in progress information files (read/write/enumerate) + **done** +* locking for the files, so redundant transfer races can be detected, + and failed transfers noticed **done** +* transfer info for git-annex-shell (problem: how to add a switch + with the necessary info w/o breaking backwards compatability?) +* update files as transfers proceed. See [[progressbars]] + (updating for downloads is easy; for uploads is hard) +* add Transfer queue TChan +* enqueue Transfers (Uploads) as new files are added to the annex by + Watcher. +* enqueue Tranferrs (Downloads) as new dangling symlinks are noticed by + Watcher. +* add TransferInfo Map to DaemonStatus for tracking transfers in progress. +* Poll transfer in progress info files for changes (use inotify again! + wow! hammer, meet nail..), and update the TransferInfo Map +* Write basic Transfer handling thread. Multiple such threads need to be + able to be run at once. Each will need its own independant copy of the + Annex state monad. +* Write transfer control thread, which decides when to launch transfers. +* At startup, and possibly periodically, look for files we have that + location tracking indicates remotes do not, and enqueue Uploads for + them. Also, enqueue Downloads for any files we're missing. +* Find a way to probe available outgoing bandwidth, to throttle so + we don't bufferbloat the network to death. +* git-annex needs a simple speed control knob, which can be plumbed + through to, at least, rsync. A good job for an hour in an + airport somewhere. + ## git syncing 1. Can use `git annex sync`, which already handles bidirectional syncing. @@ -45,6 +76,46 @@ and with appropriate rate limiting and control facilities. This probably will need lots of refinements to get working well. +### first pass: flood syncing + +Before mapping the network, the best we can do is flood all files out to every +reachable remote. This is worth doing first, since it's the simplest way to +get the basic functionality of the assistant to work. And we'll need this +anyway. + +### transfer tracking + +* Upload added to queue by the watcher thread when it adds content. +* Download added to queue by the watcher thread when it seens new symlinks + that lack content. +* Transfer threads started/stopped as necessary to move data. + (May sometimes want multiple threads downloading, or uploading, or even both.) + + type TransferQueue = TChan [Transfer] + -- add (M.Map Transfer TransferInfo) to DaemonStatus + + startTransfer :: Transfer -> Annex TransferID + + stopTransfer :: TransferID -> IO () + +The assistant needs to find out when `git-annex-shell` is receiving or +sending (triggered by another remote), so it can add data for those too. +This is important to avoid uploading content to a remote that is already +downloading it from us, or vice versa, as well as to in future let the web +app manage transfers as user desires. + +For files being received, it can see the temp file, but other than lsof +there's no good way to find the pid (and I'd rather not kill blindly). + +For files being sent, there's no filesystem indication. So git-annex-shell +(and other git-annex transfer processes) should write a status file to disk. + +Can use file locking on these status files to claim upload/download rights, +which will avoid races. + +This status file can also be updated periodically to show amount of transfer +complete (necessary for tracking uploads). + ## other considerations This assumes the network is connected. It's often not, so the diff --git a/doc/download.mdwn b/doc/download.mdwn index f0f17e141..242de13c3 100644 --- a/doc/download.mdwn +++ b/doc/download.mdwn @@ -18,6 +18,7 @@ others need some manual work. See [[install]] for details. The git repository has some branches: +* `assistant` contains the new change-tracking daemon * `ghc7.0` supports versions of ghc older than 7.4, which had a major change to filename encoding. * `old-monad-control` is for systems that don't have a newer monad-control @@ -25,6 +26,7 @@ The git repository has some branches: * `no-ifelse` avoids using the IFelse library (merge it into master if you need it) * `no-bloom` avoids using bloom filters. (merge it into master if you need it) +* `no-s3` avoids using the S3 library (merge it into master if you need it) * `debian-stable` contains the latest backport of git-annex to Debian stable. * `tweak-fetch` adds support for the git tweak-fetch hook, which has diff --git a/doc/install/OSX/comment_12_60d13f2c8e008af1041bea565a392c83._comment b/doc/install/OSX/comment_12_60d13f2c8e008af1041bea565a392c83._comment new file mode 100644 index 000000000..e2e85aaa9 --- /dev/null +++ b/doc/install/OSX/comment_12_60d13f2c8e008af1041bea565a392c83._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawnHrjHxJAm39x8DR4bnbazQO6H0nMNuY9c" + nickname="Damien" + subject="sha256 alternative" + date="2012-06-30T14:34:11Z" + content=""" +in reply to comment 6: On my Mac (10.7.4) there's `/usr/bin/shasum -a 256 <file>` command that will produce the same output as `sha256sum <file>`. +"""]] diff --git a/doc/install/OSX/comment_13_a6f48c87c2d6eabe379d6e10a6cac453._comment b/doc/install/OSX/comment_13_a6f48c87c2d6eabe379d6e10a6cac453._comment new file mode 100644 index 000000000..e5ce62b13 --- /dev/null +++ b/doc/install/OSX/comment_13_a6f48c87c2d6eabe379d6e10a6cac453._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawnHrjHxJAm39x8DR4bnbazQO6H0nMNuY9c" + nickname="Damien" + subject="gnu commands" + date="2012-07-01T17:03:57Z" + content=""" +…and another approach to the same problem: apparently git-annex also relies on the GNU coreutils (for instance, when doing `git annex get .`, `cp` complains about `illegal option -- -`). I do have the GNU coreutils installed with Homebrew, but they are all prefixed with `g`. So maybe you should try `gsha256sum` and `gcp` before `sha256sum` and `cp`, that seems like a more general solution. +"""]] diff --git a/doc/news/version_3.20120605.mdwn b/doc/news/version_3.20120605.mdwn deleted file mode 100644 index ed0a09177..000000000 --- a/doc/news/version_3.20120605.mdwn +++ /dev/null @@ -1,11 +0,0 @@ -git-annex 3.20120605 released with [[!toggle text="these changes"]] -[[!toggleable text=""" - * sync: Show a nicer message if a user tries to sync to a special remote. - * lock: Reset unlocked file to index, rather than to branch head. - * import: New subcommand, pulls files from a directory outside the annex - and adds them. - * Fix display of warning message when encountering a file that uses an - unsupported backend. - * Require that the SHA256 backend can be used when building, since it's the - default. - * Preserve parent environment when running hooks of the hook special remote."""]]
\ No newline at end of file diff --git a/doc/news/version_3.20120629.mdwn b/doc/news/version_3.20120629.mdwn new file mode 100644 index 000000000..e6b98ae99 --- /dev/null +++ b/doc/news/version_3.20120629.mdwn @@ -0,0 +1,12 @@ +git-annex 3.20120629 released with [[!toggle text="these changes"]] +[[!toggleable text=""" + * cabal: Only try to use inotify on Linux. + * Version build dependency on STM, and allow building without it, + which disables the watch command. + * Avoid ugly failure mode when moving content from a local repository + that is not available. + * Got rid of the last place that did utf8 decoding. + * Accept arbitrarily encoded repository filepaths etc when reading + git config output. This fixes support for remotes with unusual characters + in their names. + * sync: Automatically resolves merge conflicts."""]]
\ No newline at end of file |