Merge branch 'master' into assistant

Conflicts: Makefile
author: Joey Hess <joey@kitenet.net> 2012-07-25 14:55:53 -0400
committer: Joey Hess <joey@kitenet.net> 2012-07-25 14:55:53 -0400
commit: 03979d4d54e7b0ce76fa296e57b9b5e1820ce7b1 (patch)
tree: 65c67542af9998f851f57d70cece212cf32da7e1 /doc/design
parent: 95c80b644046f6fabe445972de68be40285f1841 (diff)
parent: 1abc228008031fc48011f6cebf8f6e1f0438bf56 (diff)
5 files changed, 113 insertions, 8 deletions
diff --git a/doc/design/assistant/blog/day_40__dbus/comment_2_6799f2baf6a6ce14b1fa76a8402840c0._comment b/doc/design/assistant/blog/day_40__dbus/comment_2_6799f2baf6a6ce14b1fa76a8402840c0._comment
new file mode 100644
index 000000000..832be854a
--- /dev/null
+++ b/doc/design/assistant/blog/day_40__dbus/comment_2_6799f2baf6a6ce14b1fa76a8402840c0._comment
@@ -0,0 +1,10 @@
+[[!comment format=mdwn
+ username="hamish"
+ ip="203.0.139.24"
+ subject="dbus vs polling "
+ date="2012-07-22T07:13:37Z"
+ content="""
+I, too, am running a dbus but like to hand mount my filesystems.  However, I'd imagine that I am both a minority and that my minority could like the extra control, so perhaps even a \"re-read the mtab /now/\" command that can be manually run after something is  manually mounted would suffice
+
+Is it not possible to use inotify on the mtab?
+"""]]
diff --git a/doc/design/assistant/blog/day_40__dbus/comment_3_fa1d7444bdafcb990cacf2ace7ee6ef1._comment b/doc/design/assistant/blog/day_40__dbus/comment_3_fa1d7444bdafcb990cacf2ace7ee6ef1._comment
new file mode 100644
index 000000000..a372670b8
--- /dev/null
+++ b/doc/design/assistant/blog/day_40__dbus/comment_3_fa1d7444bdafcb990cacf2ace7ee6ef1._comment
@@ -0,0 +1,10 @@
+[[!comment format=mdwn
+ username="http://joeyh.name/"
+ ip="4.154.4.169"
+ subject="comment 3"
+ date="2012-07-22T16:03:52Z"
+ content="""
+How did I not think about using my favorite hammer on this problem too? But, no, /proc/mounts cannot be watched with inotify it seems, and of course the BSDs don't seem to have a file at all.
+
+I think the dbus stuff is sorted out for manual users, see later blog entries.
+"""]]
diff --git a/doc/design/assistant/blog/day_42__the_answer.mdwn b/doc/design/assistant/blog/day_42__the_answer.mdwn
new file mode 100644
index 000000000..fd7c4ebb5
--- /dev/null
+++ b/doc/design/assistant/blog/day_42__the_answer.mdwn
@@ -0,0 +1,27 @@
+Made the MountWatcher update state for remotes located in a drive that
+gets mounted. This was tricky code. First I had to make remotes declare
+when they're located in a local directory. Then it has to rescan git
+configs of git remotes (because the git repo mounted at a mount point may
+change), and update all the state that a newly available remote can affect.
+
+And it works: I plug in a drive containing one of my git remotes, and the
+assistant automatically notices it and syncs the git repositories.
+
+---
+
+But, data isn't transferred yet. When a disconnected remote becomes
+connected, keys should be transferred in both directions to get back into
+sync.
+
+To that end, added Yet Another Thread; the TransferScanner thread
+will scan newly available remotes to find keys, and queue low priority
+transfers to get them fully in sync.
+
+(Later, this will probably also be used for network remotes that become
+available when moving between networks. I think network-manager sends
+dbus events it could use..)
+
+This new thread is missing a crucial peice, it doesn't yet have a way to
+find the keys that need to be transferred. Doing that efficiently (without
+scanning the whole git working copy) is Hard. I'm considering design
+possibilities..
diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn
index 5f00cf606..cc23f786f 100644
--- a/doc/design/assistant/syncing.mdwn
+++ b/doc/design/assistant/syncing.mdwn
@@ -3,20 +3,59 @@ all the other git clones, at both the git level and the key/value level.
 
 ## immediate action items
 
-* At startup, and possibly periodically, look for files we have that
-  location tracking indicates remotes do not, and enqueue Uploads for
-  them. Also, enqueue Downloads for any files we're missing.
+* At startup, and possibly periodically, or when the network connection
+  changes, or some heuristic suggests that a remote was disconnected from
+  us for a while, queue remotes for processing by the TransferScanner,
+  to queue Transfers of files it or we're missing.
 * After git sync, identify content that we don't have that is now available
-  on remotes, and transfer. But first, need to ensure that when a remote
+  on remotes, and transfer. (Needed when we have a uni-directional connection
+  to a remote, so it won't be uploading content to us.) 
+  But first, need to ensure that when a remote
   receives content, and updates its location log, it syncs that update
   out.
-* When MountWatcher detects a newly mounted drive, rescan git remotes
-  in order to get ones on the drive, and do a git sync and file transfers
-  to sync any repositories on it.
+
+## TransferScanner
+
+The TransferScanner thread needs to find keys that need to be Uploaded
+to a remote, or Downloaded from it.
+
+How to find the keys to transfer? I'd like to avoid potentially
+expensive traversals of the whole git working copy if I can.
+
+One way would be to do a git diff between the (unmerged) git-annex branches
+of the git repo, and its remote. Parse that for lines that add a key to
+either, and queue transfers. That should work fairly efficiently when the
+remote is a git repository. Indeed, git-annex already does such a diff
+when it's doing a union merge of data into the git-annex branch. It
+might even be possible to have the union merge and scan use the same
+git diff data.
+
+But that approach has several problems:
+
+1. The list of keys it would generate wouldn't have associated git
+   filenames, so the UI couldn't show the user what files were being
+   transferred.
+2. Worse, without filenames, any later features to exclude
+   files/directories from being transferred wouldn't work.
+3. Looking at a git diff of the git-annex branches would find keys
+   that were added to either side while the two repos were disconnected.
+   But if the two repos' keys were not fully in sync before they
+   disconnected (which is quite possible; transfers could be incomplete),
+   the diff would not show those older out of sync keys.
+
+The remote could also be a special remote. In this case, I have to either
+traverse the git working copy, or perhaps traverse the whole git-annex
+branch (which would have the same problems with filesnames not being
+available).
+
+If a traversal is done, should check all remotes, not just
+one. Probably worth handling the case where a remote is connected
+while in the middle of such a scan, so part of the scan needs to be
+redone to check it.
 
 ## longer-term TODO
 
-* Test MountWatcher on Gnome (should work ok) and LXDE (dunno).
+* Test MountWatcher on LXDE.
 * git-annex needs a simple speed control knob, which can be plumbed
   through to, at least, rsync. A good job for an hour in an
   airport somewhere.
diff --git a/doc/design/assistant/syncing/comment_1_c70156174ff19b503978d623bd2df36f._comment b/doc/design/assistant/syncing/comment_1_c70156174ff19b503978d623bd2df36f._comment
new file mode 100644
index 000000000..019490e61
--- /dev/null
+++ b/doc/design/assistant/syncing/comment_1_c70156174ff19b503978d623bd2df36f._comment
@@ -0,0 +1,19 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawk4YX0PWICfWGRLuncCPufMPDctT7KAYJA"
+ nickname="betabrain"
+ subject="selective data syncing"
+ date="2012-07-24T15:27:08Z"
+ content="""
+How will the assistant know which files' data to distribute between the repos?
+
+I'm using git-annex and it's numcopies attribute to maintain a redundant archive spread over different computers and usb drives. Not all drives should get a copy of everything, e.g. the usb drive I take to work should not automatically get a copy of family pictures.
+
+How about .gitattributes?
+
+* \* annex.auto-sync-data = false # don't automatically sync the data
+* archive/ annex.auto-push-repos = NAS # everything added to archive/ in any repo goes automatically to the NAS remote.
+* work/ annex.auto-synced-repos = LAPTOP WORKUSB # everything added to work/ in LAPTOP or WORKUSB gets synced to WORKUSB and LAPTOP
+* work/ annex.auto-push-repos = LAPTOP WORKUSB # stuff added to work/ anywhere gets synced to LAPTOP and WORKUSB
+* important/ annex.auto-sync-data = true # push data to all repos
+* webserver_logs/ annex.remote.WEBSERVER.auto-push-repos = S3 # only the assistant running in WEBSERVER pushes webserver_logs/ to S3 remote
+"""]]
author	Joey Hess <joey@kitenet.net>	2012-07-25 14:55:53 -0400
committer	Joey Hess <joey@kitenet.net>	2012-07-25 14:55:53 -0400
commit	03979d4d54e7b0ce76fa296e57b9b5e1820ce7b1 (patch)
tree	65c67542af9998f851f57d70cece212cf32da7e1 /doc/design
parent	95c80b644046f6fabe445972de68be40285f1841 (diff)
parent	1abc228008031fc48011f6cebf8f6e1f0438bf56 (diff)