summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2012-06-17 15:45:35 -0400
committerGravatar Joey Hess <joey@kitenet.net>2012-06-17 15:45:35 -0400
commitbf3339e5b7c26cd24acefdf7c33059433195e1f6 (patch)
treec78215210629133a77afd39a96f5a0b4f2414f85 /doc
parentc373f6e9546d615fb7c3f2c77a35136c9ccf654a (diff)
parentec197feec062c59760a931aafb5d3087b921999a (diff)
Merge branch 'master' into watch
Diffstat (limited to 'doc')
-rw-r--r--doc/design/assistant/blog/day_10__lsof.mdwn54
-rw-r--r--doc/design/assistant/blog/day_10__lsof/comment_1_9b8c28c85c979f32e5c295b6a03c048e._comment9
-rw-r--r--doc/design/assistant/inotify.mdwn113
-rw-r--r--doc/forum/exporting_annexed_files/comment_1_e08e4c79588e17fb2f1cdf53d9fab7ea._comment16
-rw-r--r--doc/forum/exporting_annexed_files/comment_2_15dc3024417b5b2ff3544a08beacab34._comment8
-rw-r--r--doc/forum/exporting_annexed_files/comment_3_86f0e0f767a84a0f583e121d36cb7d48._comment8
6 files changed, 174 insertions, 34 deletions
diff --git a/doc/design/assistant/blog/day_10__lsof.mdwn b/doc/design/assistant/blog/day_10__lsof.mdwn
new file mode 100644
index 000000000..32b670571
--- /dev/null
+++ b/doc/design/assistant/blog/day_10__lsof.mdwn
@@ -0,0 +1,54 @@
+A rather frustrating and long day coding went like this:
+
+## 1-3 pm
+
+Wrote a single function, of which all any Haskell programmer needs to know
+is its type signature:
+
+ Lsof.queryDir :: FilePath -> IO [(FilePath, LsofOpenMode, ProcessInfo)]
+
+When I'm spending another hour or two taking a unix utility like lsof and
+parsing its output, which in this case is in a rather complicated
+machine-parsable output format, I often wish unix streams were strongly
+typed, which would avoid this bother.
+
+## 3-9 pm
+
+Six hours spent making it defer annexing files until the commit thread
+wakes up and is about to make a commit. Why did it take so horribly long?
+Well, there were a number of complications, and some really bad bugs
+involving races that were hard to reproduce reliably enough to deal with.
+
+In other words, I was lost in the weeds for a lot of those hours...
+
+At one point, something glorious happened, and it was always making exactly
+one commit for batch mode modifications of a lot of files (like untarring
+them). Unfortunatly, I had to lose that gloriousness due to another
+potential race, which, while unlikely, would have made the program deadlock
+if it happened.
+
+So, it's back to making 2 or 3 commits per batch mode change. I also have a
+buglet that causes sometimes a second empty commit after a file is added.
+I know why (the inotify event for the symlink gets in late,
+after the commit); will try to improve commit frequency later.
+
+## 9-11 pm
+
+Put the capstone on the day's work, by calling lsof on a directory full
+of hardlinks to the files that are about to be annexed, to check if any
+are still open for write.
+
+This works great! Starting up `git annex watch` when processes have files
+open is no longer a problem, and even if you're evil enough to try having
+muliple processes open the same file, it will complain and not annex it
+until all the writers close it.
+
+(Well, someone really evil could turn the write bit back on after git annex
+clears it, and open the file again, but then really evil people can do
+that to files in `.git/annex/objects` too, and they'll get their just
+deserts when `git annex fsck` runs. So, that's ok..)
+
+----
+
+Anyway, will beat on it more tomorrow, and if all is well, this will finally
+go out to the beta testers.
diff --git a/doc/design/assistant/blog/day_10__lsof/comment_1_9b8c28c85c979f32e5c295b6a03c048e._comment b/doc/design/assistant/blog/day_10__lsof/comment_1_9b8c28c85c979f32e5c295b6a03c048e._comment
new file mode 100644
index 000000000..9d970da22
--- /dev/null
+++ b/doc/design/assistant/blog/day_10__lsof/comment_1_9b8c28c85c979f32e5c295b6a03c048e._comment
@@ -0,0 +1,9 @@
+[[!comment format=mdwn
+ username="http://dieter-be.myopenid.com/"
+ nickname="dieter"
+ subject="comment 1"
+ date="2012-06-16T09:14:26Z"
+ content="""
+maybe at some point, your tool could show \"warning, the following files are still open and are hence not being annexed\"
+to avoid any nasty surprises of a file not being annexed and the user not realizing it.
+"""]]
diff --git a/doc/design/assistant/inotify.mdwn b/doc/design/assistant/inotify.mdwn
index 02c30752d..9d3db9192 100644
--- a/doc/design/assistant/inotify.mdwn
+++ b/doc/design/assistant/inotify.mdwn
@@ -7,48 +7,56 @@ There is a `watch` branch in git that adds the command.
## known bugs
-* A process has a file open for write, another one closes it,
- and so it's added. Then the first process modifies it.
+* If a file is checked into git as a normal file and gets modified
+ (or merged, etc), it will be converted into an annexed file.
+ See [[blog/day_7__bugfixes]]
- Or, a process has a file open for write when `git annex watch` starts
- up, it will be added to the annex. If the process later continues
- writing, it will change content in the annex.
+* When you `git annex unlock` a file, it will immediately be re-locked.
- This changes content in the annex, and fsck will later catch
- the inconsistency.
+## beyond Linux
- Possible fixes:
+I'd also like to support OSX and if possible the BSDs.
- * Somehow track or detect if a file is open for write by any processes.
- `lsof` could be used, although it would be a little slow.
+* kqueue ([haskell bindings](http://hackage.haskell.org/package/kqueue))
+ is supported by FreeBSD, OSX, and other BSDs.
- Here's one way to avoid the slowdown: When a file is being added,
- set it read-only, and hard-link it into a quarantine directory,
- remembering both filenames.
- Then use the batch change mode code to detect batch adds and bundle
- them together.
- Just before committing, lsof the quarantine directory. Any files in
- it that are still open for write can just have their write bit turned
- back on and be deleted from quarantine, to be handled when their writer
- closes. Files that pass quarantine get added as usual. This avoids
- repeated lsof calls slowing down adds, but does add a constant factor
- overhead (0.25 seconds lsof call) before any add gets committed.
+ In kqueue, to watch for changes to a file, you have to have an open file
+ descriptor to the file. This wouldn't scale.
- * Or, when possible, making a copy on write copy before adding the file
- would avoid this.
- * Or, as a last resort, make an expensive copy of the file and add that.
- * Tracking file opens and closes with inotify could tell if any other
- processes have the file open. But there are problems.. It doesn't
- seem to differentiate between files opened for read and for write.
- And there would still be a race after the last close and before it's
- injected into the annex, where it could be opened for write again.
- Would need to detect that and undo the annex injection or something.
+ Apparently, a directory can be watched, and events are generated when
+ files are added/removed from it. You then have to scan to find which
+ files changed. [example](https://developer.apple.com/library/mac/#samplecode/FileNotification/Listings/Main_c.html#//apple_ref/doc/uid/DTS10003143-Main_c-DontLinkElementID_3)
-* If a file is checked into git as a normal file and gets modified
- (or merged, etc), it will be converted into an annexed file.
- See [[blog/day_7__bugfixes]]
+ Gamin does the best it can with just kqueue, supplimented by polling.
+ The source file `server/gam_kqueue.c` makes for interesting reading.
+ Using gamin to do the heavy lifting is one option.
+ ([haskell bindings](http://hackage.haskell.org/package/hlibfam) for FAM;
+ gamin shares the API)
-* When you `git annex unlock` a file, it will immediately be re-locked.
+ kqueue does not seem to provide a way to tell when a file gets closed,
+ only when it's initially created. Poses problems..
+
+ * [man page](http://www.freebsd.org/cgi/man.cgi?query=kqueue&apropos=0&sektion=0&format=html)
+ * <https://github.com/gorakhargosh/watchdog/blob/master/src/watchdog/observers/kqueue.py> (good example program)
+
+* hfsevents ([haskell bindings](http://hackage.haskell.org/package/hfsevents))
+ is OSX specific.
+
+ Originally it was only directory level, and you were only told a
+ directory had changed and not which file. Based on the haskell
+ binding's code, from OSX 10.7.0, file level events were added.
+
+ This will be harder for me to develop for, since I don't have access to
+ OSX machines..
+
+ hfsevents does not seem to provide a way to tell when a file gets closed,
+ only when it's initially created. Poses problems..
+
+ * <https://developer.apple.com/library/mac/#documentation/Darwin/Conceptual/FSEvents_ProgGuide/Introduction/Introduction.html>
+ * <http://pypi.python.org/pypi/MacFSEvents/0.2.8> (good example program)
+ * <https://github.com/gorakhargosh/watchdog/blob/master/src/watchdog/observers/fsevents.py> (good example program)
+
+* Windows has a Win32 ReadDirectoryChangesW, and perhaps other things.
## beyond Linux
@@ -171,3 +179,40 @@ Many races need to be dealt with by this code. Here are some of them.
- coleasce related add/rm events for speed and less disk IO **done**
- don't annex `.gitignore` and `.gitattributes` files **done**
- run as a daemon **done**
+- A process has a file open for write, another one closes it,
+ and so it's added. Then the first process modifies it.
+
+ Or, a process has a file open for write when `git annex watch` starts
+ up, it will be added to the annex. If the process later continues
+ writing, it will change content in the annex.
+
+ This changes content in the annex, and fsck will later catch
+ the inconsistency.
+
+ Possible fixes:
+
+ * Somehow track or detect if a file is open for write by any processes.
+ `lsof` could be used, although it would be a little slow.
+
+ Here's one way to avoid the slowdown: When a file is being added,
+ set it read-only, and hard-link it into a quarantine directory,
+ remembering both filenames.
+ Then use the batch change mode code to detect batch adds and bundle
+ them together.
+ Just before committing, lsof the quarantine directory. Any files in
+ it that are still open for write can just have their write bit turned
+ back on and be deleted from quarantine, to be handled when their writer
+ closes. Files that pass quarantine get added as usual. This avoids
+ repeated lsof calls slowing down adds, but does add a constant factor
+ overhead (0.25 seconds lsof call) before any add gets committed. **done**
+
+ * Or, when possible, making a copy on write copy before adding the file
+ would avoid this.
+ * Or, as a last resort, make an expensive copy of the file and add that.
+ * Tracking file opens and closes with inotify could tell if any other
+ processes have the file open. But there are problems.. It doesn't
+ seem to differentiate between files opened for read and for write.
+ And there would still be a race after the last close and before it's
+ injected into the annex, where it could be opened for write again.
+ Would need to detect that and undo the annex injection or something.
+
diff --git a/doc/forum/exporting_annexed_files/comment_1_e08e4c79588e17fb2f1cdf53d9fab7ea._comment b/doc/forum/exporting_annexed_files/comment_1_e08e4c79588e17fb2f1cdf53d9fab7ea._comment
new file mode 100644
index 000000000..69fc46245
--- /dev/null
+++ b/doc/forum/exporting_annexed_files/comment_1_e08e4c79588e17fb2f1cdf53d9fab7ea._comment
@@ -0,0 +1,16 @@
+[[!comment format=mdwn
+ username="http://joeyh.name/"
+ ip="4.154.6.135"
+ subject="comment 1"
+ date="2012-06-15T19:25:59Z"
+ content="""
+Sure, you can simply:
+
+ cp annexedfile ~
+
+Or just attach the file right from the git repository to an email, like any other file. Should work fine.
+
+If you wanted to copy a whole directory to export, you'd need to use the -L flag to make cp follow the symlinks and copy the real contents:
+
+ cp -r -L annexeddirectory /media/usbdrive/
+"""]]
diff --git a/doc/forum/exporting_annexed_files/comment_2_15dc3024417b5b2ff3544a08beacab34._comment b/doc/forum/exporting_annexed_files/comment_2_15dc3024417b5b2ff3544a08beacab34._comment
new file mode 100644
index 000000000..3621f9b89
--- /dev/null
+++ b/doc/forum/exporting_annexed_files/comment_2_15dc3024417b5b2ff3544a08beacab34._comment
@@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="http://denis.laxalde.org/"
+ nickname="dlax"
+ subject="nautilus"
+ date="2012-06-15T19:57:31Z"
+ content="""
+Ah! I was fooled by nautilus which is not able to properly handle symlinks when copying. It copies links instead of target [[!gnomebug 623580]].
+"""]]
diff --git a/doc/forum/exporting_annexed_files/comment_3_86f0e0f767a84a0f583e121d36cb7d48._comment b/doc/forum/exporting_annexed_files/comment_3_86f0e0f767a84a0f583e121d36cb7d48._comment
new file mode 100644
index 000000000..db6f90d88
--- /dev/null
+++ b/doc/forum/exporting_annexed_files/comment_3_86f0e0f767a84a0f583e121d36cb7d48._comment
@@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="http://joeyh.name/"
+ ip="4.154.6.135"
+ subject="comment 3"
+ date="2012-06-16T03:26:37Z"
+ content="""
+That nautilous behavior is a bad thing when trying to export files out, but it's a good thing when just moving files around inside your repository...
+"""]]