aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2012-07-01 21:00:43 -0400
committerGravatar Joey Hess <joey@kitenet.net>2012-07-01 21:00:43 -0400
commit7625319c2c18c1d75a4ba5e4c2819fb0a31641ed (patch)
treec275d60dcf2493a03ed7ad77e7aeb747633fd2c0 /doc
parent397117429c8824bad7e994454a1d9b8e6f4b3b96 (diff)
parent2d2bfe9809f8d8d5862bc12fbe40c2e25b2405a3 (diff)
Merge branch 'master' into assistant
Diffstat (limited to 'doc')
-rw-r--r--doc/bugs/Issue_on_OSX_with_some_system_limits/comment_2_b14e697c211843163285aaa8de5bf4c6._comment12
-rw-r--r--doc/bugs/watcher_commits_unlocked_files.mdwn28
-rw-r--r--doc/bugs/watcher_commits_unlocked_files/comment_1_f70e1912fde0eee59e208307df06b503._comment8
-rw-r--r--doc/design/assistant/blog/day_20__data_transfer_design.mdwn22
-rw-r--r--doc/design/assistant/blog/day_21__transfer_tracking.mdwn28
-rw-r--r--doc/design/assistant/inotify.mdwn20
-rw-r--r--doc/design/assistant/progressbars.mdwn2
-rw-r--r--doc/design/assistant/syncing.mdwn71
-rw-r--r--doc/download.mdwn2
-rw-r--r--doc/install/OSX/comment_12_60d13f2c8e008af1041bea565a392c83._comment8
-rw-r--r--doc/install/OSX/comment_13_a6f48c87c2d6eabe379d6e10a6cac453._comment8
-rw-r--r--doc/news/version_3.20120605.mdwn11
-rw-r--r--doc/news/version_3.20120629.mdwn12
13 files changed, 216 insertions, 16 deletions
diff --git a/doc/bugs/Issue_on_OSX_with_some_system_limits/comment_2_b14e697c211843163285aaa8de5bf4c6._comment b/doc/bugs/Issue_on_OSX_with_some_system_limits/comment_2_b14e697c211843163285aaa8de5bf4c6._comment
new file mode 100644
index 000000000..17dcf7634
--- /dev/null
+++ b/doc/bugs/Issue_on_OSX_with_some_system_limits/comment_2_b14e697c211843163285aaa8de5bf4c6._comment
@@ -0,0 +1,12 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawkSq2FDpK2n66QRUxtqqdbyDuwgbQmUWus"
+ nickname="Jimmy"
+ subject="comment 2"
+ date="2012-06-29T12:02:48Z"
+ content="""
+Doing,
+
+ sudo sysctl -w kern.maxfilesperproc=400000
+
+Somewhat works for me, git-annex watch at least starts up and takes a while to scan the directory, but it's not ideal. Also, creating files seems to work okay, when I remove a file the changes don't seem to get pushed across my other repos, running a sync on the remote repo fixes things.
+"""]]
diff --git a/doc/bugs/watcher_commits_unlocked_files.mdwn b/doc/bugs/watcher_commits_unlocked_files.mdwn
new file mode 100644
index 000000000..ef64921f1
--- /dev/null
+++ b/doc/bugs/watcher_commits_unlocked_files.mdwn
@@ -0,0 +1,28 @@
+When having "git annex watch" running, unlocking files causes the watcher
+to immediately lock/commit them.
+
+----
+
+Possible approaches:
+
+* The watcher could detect unlocked files by checking if newly added files
+ are a typechange of a file already in git. But this would add git overhead
+ to every file add.
+* `git annex unlock` could add some type of flag file, which the assistant
+ could check. This would work fine, for users who want to use `git annex
+ unlock` with the assistant. That's probably not simple enough for most
+ users, though.
+* There could be a UI in the assistant to pick a file and unlock it.
+ The assistant would have its own list of files it knows are unlocked.
+ But I'm trying to avoid mandatory UI to use the assistant.
+* Perhaps instead, have a directory, like "edit". The assistant could notice
+ when files move into this special directory, and automatically unlock them.
+ Then when they're moved out, automatically commit them.
+* Alternatively, files that are moved out of the repository entirely could be
+ automatically unlocked, and then when they're moved back in, it would
+ automatically do the right thing. This may be worth implementing in
+ combination with the "edit" directory, as different use cases would work
+ better with one or the other. However, I don't currently get inotify
+ events when files are moved out of the repository (well, I do, but it
+ just says "file moved", with no forwarding address, so I don't know
+ how to find the file to unlock it.
diff --git a/doc/bugs/watcher_commits_unlocked_files/comment_1_f70e1912fde0eee59e208307df06b503._comment b/doc/bugs/watcher_commits_unlocked_files/comment_1_f70e1912fde0eee59e208307df06b503._comment
new file mode 100644
index 000000000..a06b8fe82
--- /dev/null
+++ b/doc/bugs/watcher_commits_unlocked_files/comment_1_f70e1912fde0eee59e208307df06b503._comment
@@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawkSq2FDpK2n66QRUxtqqdbyDuwgbQmUWus"
+ nickname="Jimmy"
+ subject="comment 1"
+ date="2012-06-28T13:39:18Z"
+ content="""
+That is a known problem/bug which is listed at [[design/assistant/inotify]]
+"""]]
diff --git a/doc/design/assistant/blog/day_20__data_transfer_design.mdwn b/doc/design/assistant/blog/day_20__data_transfer_design.mdwn
new file mode 100644
index 000000000..4f47ae63c
--- /dev/null
+++ b/doc/design/assistant/blog/day_20__data_transfer_design.mdwn
@@ -0,0 +1,22 @@
+Today is a planning day. I have only a few days left before I'm off to
+Nicaragua for [DebConf](http://debconf12.debconf.org/), where I'll only
+have smaller chunks of time without interruptions. So it's important to get
+some well-defined smallish chunks designed that I can work on later. See
+bulleted action items below (now moved to [[syncing]]. Each
+should be around 1-2 hours unless it turns out to be 8 hours... :)
+
+First, worked on writing down a design, and some data types, for data transfer
+tracking (see [[syncing]] page). Found that writing down these simple data
+types before I started slinging code has clarified things a lot for me.
+
+Most importantly, I realized that I will need to modify `git-annex-shell`
+to record on disk what transfers it's doing, so the assistant can get that
+information and use it to both avoid redundant transfers (potentially a big
+problem!), and later to allow the user to control them using the web app.
+
+While eventually the user will be able to use the web app to prioritize
+transfers, stop and start, throttle, etc, it's important to get the default
+behavior right. So I'm thinking about things like how to prioritize uploads
+vs downloads, when it's appropriate to have multiple downloads running at
+once, etc.
+
diff --git a/doc/design/assistant/blog/day_21__transfer_tracking.mdwn b/doc/design/assistant/blog/day_21__transfer_tracking.mdwn
new file mode 100644
index 000000000..79c0b6438
--- /dev/null
+++ b/doc/design/assistant/blog/day_21__transfer_tracking.mdwn
@@ -0,0 +1,28 @@
+Worked today on two action items from my last blog post:
+
+* on-disk transfers in progress information files (read/write/enumerate)
+* locking for the files, so redundant transfer races can be detected,
+ and failed transfers noticed
+
+That's all done, and used by the `get`, `copy`, and `move` subcommands.
+
+Also, I made `git-annex status` use that information to display any
+file transfers that are currently in progress:
+
+ joey@gnu:~/lib/sound/misc>git annex status
+ [...]
+ transfers in progress:
+ downloading Vic-303.mp3 from leech
+
+(Webapp, here we come!)
+
+However... Files being sent or received by `git-annex-shell` don't yet
+have this transfer info recorded. The problem is that to do so,
+`git-annex-shell` will need to be run with a `--remote=` parameter. But
+old versions will of course fail when run with such an unknown parameter.
+
+This is a problem I last faced in December 2011 when adding the `--uuid=`
+parameter. That time I punted and required the remote `git-annex-shell` be
+updated to a new enough version to accept it. But as git-annex gets more widely
+used and packaged, that's becoming less an option. I need to find a real
+solution to this problem.
diff --git a/doc/design/assistant/inotify.mdwn b/doc/design/assistant/inotify.mdwn
index 47b8c84a3..7b600090a 100644
--- a/doc/design/assistant/inotify.mdwn
+++ b/doc/design/assistant/inotify.mdwn
@@ -8,13 +8,15 @@ available!
* If a file is checked into git as a normal file and gets modified
(or merged, etc), it will be converted into an annexed file.
- See [[blog/day_7__bugfixes]]
+ See [[blog/day_7__bugfixes]].
* When you `git annex unlock` a file, it will immediately be re-locked.
+ See [[bugs/watcher_commits_unlocked_files]].
* Kqueue has to open every directory it watches, so too many directories
will run it out of the max number of open files (typically 1024), and fail.
I may need to fork off multiple watcher processes to handle this.
+ See [[bugs/Issue_on_OSX_with_some_system_limits]].
## beyond Linux
@@ -42,6 +44,8 @@ I'd also like to support OSX and if possible the BSDs.
* [man page](http://www.freebsd.org/cgi/man.cgi?query=kqueue&apropos=0&sektion=0&format=html)
* <https://github.com/gorakhargosh/watchdog/blob/master/src/watchdog/observers/kqueue.py> (good example program)
+ *kqueue is now supported*
+
* hfsevents ([haskell bindings](http://hackage.haskell.org/package/hfsevents))
is OSX specific.
@@ -71,9 +75,6 @@ I'd also like to support OSX and if possible the BSDs.
- honor .gitignore, not adding files it excludes (difficult, probably
needs my own .gitignore parser to avoid excessive running of git commands
to check for ignored files)
-- Possibly, when a directory is moved out of the annex location,
- unannex its contents. (Does inotify tell us where the directory moved
- to so we can access it?)
## the races
@@ -125,6 +126,17 @@ Many races need to be dealt with by this code. Here are some of them.
Not a problem; The removal event removes the old file from the index, and
the add event adds the new one.
+* Symlink appears, but is then deleted before it can be processed.
+
+ Leads to an ugly message, otherwise no problem:
+
+ ./me: readSymbolicLink: does not exist (No such file or directory)
+
+ Here `me` is a file that was in a conflicted merge, which got
+ removed as part of the resolution. This is probably coming from the watcher
+ thread, which sees the newly added symlink (created by the git merge),
+ but finds it deleted (by the conflict resolver) by the time it processes it.
+
## done
- on startup, add any files that have appeared since last run **done**
diff --git a/doc/design/assistant/progressbars.mdwn b/doc/design/assistant/progressbars.mdwn
index 2ade05aa5..ee7384274 100644
--- a/doc/design/assistant/progressbars.mdwn
+++ b/doc/design/assistant/progressbars.mdwn
@@ -9,6 +9,6 @@ To get this info for downloads, git-annex can watch the file as it arrives
and use its size.
TODO: What about uploads? Will i have to parse rsync's progresss output?
-Feed it via a named pipe? Ugh.
+Feed it via a named pipe? Ugh. Check into librsync.
This is one of those potentially hidden but time consuming problems.
diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn
index 50e6fb4f1..5476b56f1 100644
--- a/doc/design/assistant/syncing.mdwn
+++ b/doc/design/assistant/syncing.mdwn
@@ -1,6 +1,37 @@
Once files are added (or removed or moved), need to send those changes to
all the other git clones, at both the git level and the key/value level.
+## action items
+
+* on-disk transfers in progress information files (read/write/enumerate)
+ **done**
+* locking for the files, so redundant transfer races can be detected,
+ and failed transfers noticed **done**
+* transfer info for git-annex-shell (problem: how to add a switch
+ with the necessary info w/o breaking backwards compatability?)
+* update files as transfers proceed. See [[progressbars]]
+ (updating for downloads is easy; for uploads is hard)
+* add Transfer queue TChan
+* enqueue Transfers (Uploads) as new files are added to the annex by
+ Watcher.
+* enqueue Tranferrs (Downloads) as new dangling symlinks are noticed by
+ Watcher.
+* add TransferInfo Map to DaemonStatus for tracking transfers in progress.
+* Poll transfer in progress info files for changes (use inotify again!
+ wow! hammer, meet nail..), and update the TransferInfo Map
+* Write basic Transfer handling thread. Multiple such threads need to be
+ able to be run at once. Each will need its own independant copy of the
+ Annex state monad.
+* Write transfer control thread, which decides when to launch transfers.
+* At startup, and possibly periodically, look for files we have that
+ location tracking indicates remotes do not, and enqueue Uploads for
+ them. Also, enqueue Downloads for any files we're missing.
+* Find a way to probe available outgoing bandwidth, to throttle so
+ we don't bufferbloat the network to death.
+* git-annex needs a simple speed control knob, which can be plumbed
+ through to, at least, rsync. A good job for an hour in an
+ airport somewhere.
+
## git syncing
1. Can use `git annex sync`, which already handles bidirectional syncing.
@@ -45,6 +76,46 @@ and with appropriate rate limiting and control facilities.
This probably will need lots of refinements to get working well.
+### first pass: flood syncing
+
+Before mapping the network, the best we can do is flood all files out to every
+reachable remote. This is worth doing first, since it's the simplest way to
+get the basic functionality of the assistant to work. And we'll need this
+anyway.
+
+### transfer tracking
+
+* Upload added to queue by the watcher thread when it adds content.
+* Download added to queue by the watcher thread when it seens new symlinks
+ that lack content.
+* Transfer threads started/stopped as necessary to move data.
+ (May sometimes want multiple threads downloading, or uploading, or even both.)
+
+ type TransferQueue = TChan [Transfer]
+ -- add (M.Map Transfer TransferInfo) to DaemonStatus
+
+ startTransfer :: Transfer -> Annex TransferID
+
+ stopTransfer :: TransferID -> IO ()
+
+The assistant needs to find out when `git-annex-shell` is receiving or
+sending (triggered by another remote), so it can add data for those too.
+This is important to avoid uploading content to a remote that is already
+downloading it from us, or vice versa, as well as to in future let the web
+app manage transfers as user desires.
+
+For files being received, it can see the temp file, but other than lsof
+there's no good way to find the pid (and I'd rather not kill blindly).
+
+For files being sent, there's no filesystem indication. So git-annex-shell
+(and other git-annex transfer processes) should write a status file to disk.
+
+Can use file locking on these status files to claim upload/download rights,
+which will avoid races.
+
+This status file can also be updated periodically to show amount of transfer
+complete (necessary for tracking uploads).
+
## other considerations
This assumes the network is connected. It's often not, so the
diff --git a/doc/download.mdwn b/doc/download.mdwn
index f0f17e141..242de13c3 100644
--- a/doc/download.mdwn
+++ b/doc/download.mdwn
@@ -18,6 +18,7 @@ others need some manual work. See [[install]] for details.
The git repository has some branches:
+* `assistant` contains the new change-tracking daemon
* `ghc7.0` supports versions of ghc older than 7.4, which
had a major change to filename encoding.
* `old-monad-control` is for systems that don't have a newer monad-control
@@ -25,6 +26,7 @@ The git repository has some branches:
* `no-ifelse` avoids using the IFelse library
(merge it into master if you need it)
* `no-bloom` avoids using bloom filters. (merge it into master if you need it)
+* `no-s3` avoids using the S3 library (merge it into master if you need it)
* `debian-stable` contains the latest backport of git-annex to Debian
stable.
* `tweak-fetch` adds support for the git tweak-fetch hook, which has
diff --git a/doc/install/OSX/comment_12_60d13f2c8e008af1041bea565a392c83._comment b/doc/install/OSX/comment_12_60d13f2c8e008af1041bea565a392c83._comment
new file mode 100644
index 000000000..e2e85aaa9
--- /dev/null
+++ b/doc/install/OSX/comment_12_60d13f2c8e008af1041bea565a392c83._comment
@@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawnHrjHxJAm39x8DR4bnbazQO6H0nMNuY9c"
+ nickname="Damien"
+ subject="sha256 alternative"
+ date="2012-06-30T14:34:11Z"
+ content="""
+in reply to comment 6: On my Mac (10.7.4) there's `/usr/bin/shasum -a 256 <file>` command that will produce the same output as `sha256sum <file>`.
+"""]]
diff --git a/doc/install/OSX/comment_13_a6f48c87c2d6eabe379d6e10a6cac453._comment b/doc/install/OSX/comment_13_a6f48c87c2d6eabe379d6e10a6cac453._comment
new file mode 100644
index 000000000..e5ce62b13
--- /dev/null
+++ b/doc/install/OSX/comment_13_a6f48c87c2d6eabe379d6e10a6cac453._comment
@@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawnHrjHxJAm39x8DR4bnbazQO6H0nMNuY9c"
+ nickname="Damien"
+ subject="gnu commands"
+ date="2012-07-01T17:03:57Z"
+ content="""
+…and another approach to the same problem: apparently git-annex also relies on the GNU coreutils (for instance, when doing `git annex get .`, `cp` complains about `illegal option -- -`). I do have the GNU coreutils installed with Homebrew, but they are all prefixed with `g`. So maybe you should try `gsha256sum` and `gcp` before `sha256sum` and `cp`, that seems like a more general solution.
+"""]]
diff --git a/doc/news/version_3.20120605.mdwn b/doc/news/version_3.20120605.mdwn
deleted file mode 100644
index ed0a09177..000000000
--- a/doc/news/version_3.20120605.mdwn
+++ /dev/null
@@ -1,11 +0,0 @@
-git-annex 3.20120605 released with [[!toggle text="these changes"]]
-[[!toggleable text="""
- * sync: Show a nicer message if a user tries to sync to a special remote.
- * lock: Reset unlocked file to index, rather than to branch head.
- * import: New subcommand, pulls files from a directory outside the annex
- and adds them.
- * Fix display of warning message when encountering a file that uses an
- unsupported backend.
- * Require that the SHA256 backend can be used when building, since it's the
- default.
- * Preserve parent environment when running hooks of the hook special remote."""]] \ No newline at end of file
diff --git a/doc/news/version_3.20120629.mdwn b/doc/news/version_3.20120629.mdwn
new file mode 100644
index 000000000..e6b98ae99
--- /dev/null
+++ b/doc/news/version_3.20120629.mdwn
@@ -0,0 +1,12 @@
+git-annex 3.20120629 released with [[!toggle text="these changes"]]
+[[!toggleable text="""
+ * cabal: Only try to use inotify on Linux.
+ * Version build dependency on STM, and allow building without it,
+ which disables the watch command.
+ * Avoid ugly failure mode when moving content from a local repository
+ that is not available.
+ * Got rid of the last place that did utf8 decoding.
+ * Accept arbitrarily encoded repository filepaths etc when reading
+ git config output. This fixes support for remotes with unusual characters
+ in their names.
+ * sync: Automatically resolves merge conflicts."""]] \ No newline at end of file