diff options
Diffstat (limited to 'doc/design')
5 files changed, 113 insertions, 8 deletions
diff --git a/doc/design/assistant/blog/day_40__dbus/comment_2_6799f2baf6a6ce14b1fa76a8402840c0._comment b/doc/design/assistant/blog/day_40__dbus/comment_2_6799f2baf6a6ce14b1fa76a8402840c0._comment new file mode 100644 index 000000000..832be854a --- /dev/null +++ b/doc/design/assistant/blog/day_40__dbus/comment_2_6799f2baf6a6ce14b1fa76a8402840c0._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="hamish" + ip="203.0.139.24" + subject="dbus vs polling " + date="2012-07-22T07:13:37Z" + content=""" +I, too, am running a dbus but like to hand mount my filesystems. However, I'd imagine that I am both a minority and that my minority could like the extra control, so perhaps even a \"re-read the mtab /now/\" command that can be manually run after something is manually mounted would suffice + +Is it not possible to use inotify on the mtab? +"""]] diff --git a/doc/design/assistant/blog/day_40__dbus/comment_3_fa1d7444bdafcb990cacf2ace7ee6ef1._comment b/doc/design/assistant/blog/day_40__dbus/comment_3_fa1d7444bdafcb990cacf2ace7ee6ef1._comment new file mode 100644 index 000000000..a372670b8 --- /dev/null +++ b/doc/design/assistant/blog/day_40__dbus/comment_3_fa1d7444bdafcb990cacf2ace7ee6ef1._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="4.154.4.169" + subject="comment 3" + date="2012-07-22T16:03:52Z" + content=""" +How did I not think about using my favorite hammer on this problem too? But, no, /proc/mounts cannot be watched with inotify it seems, and of course the BSDs don't seem to have a file at all. + +I think the dbus stuff is sorted out for manual users, see later blog entries. +"""]] diff --git a/doc/design/assistant/blog/day_42__the_answer.mdwn b/doc/design/assistant/blog/day_42__the_answer.mdwn new file mode 100644 index 000000000..fd7c4ebb5 --- /dev/null +++ b/doc/design/assistant/blog/day_42__the_answer.mdwn @@ -0,0 +1,27 @@ +Made the MountWatcher update state for remotes located in a drive that +gets mounted. This was tricky code. First I had to make remotes declare +when they're located in a local directory. Then it has to rescan git +configs of git remotes (because the git repo mounted at a mount point may +change), and update all the state that a newly available remote can affect. + +And it works: I plug in a drive containing one of my git remotes, and the +assistant automatically notices it and syncs the git repositories. + +--- + +But, data isn't transferred yet. When a disconnected remote becomes +connected, keys should be transferred in both directions to get back into +sync. + +To that end, added Yet Another Thread; the TransferScanner thread +will scan newly available remotes to find keys, and queue low priority +transfers to get them fully in sync. + +(Later, this will probably also be used for network remotes that become +available when moving between networks. I think network-manager sends +dbus events it could use..) + +This new thread is missing a crucial peice, it doesn't yet have a way to +find the keys that need to be transferred. Doing that efficiently (without +scanning the whole git working copy) is Hard. I'm considering design +possibilities.. diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 5f00cf606..cc23f786f 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -3,20 +3,59 @@ all the other git clones, at both the git level and the key/value level. ## immediate action items -* At startup, and possibly periodically, look for files we have that - location tracking indicates remotes do not, and enqueue Uploads for - them. Also, enqueue Downloads for any files we're missing. +* At startup, and possibly periodically, or when the network connection + changes, or some heuristic suggests that a remote was disconnected from + us for a while, queue remotes for processing by the TransferScanner, + to queue Transfers of files it or we're missing. * After git sync, identify content that we don't have that is now available - on remotes, and transfer. But first, need to ensure that when a remote + on remotes, and transfer. (Needed when we have a uni-directional connection + to a remote, so it won't be uploading content to us.) + But first, need to ensure that when a remote receives content, and updates its location log, it syncs that update out. -* When MountWatcher detects a newly mounted drive, rescan git remotes - in order to get ones on the drive, and do a git sync and file transfers - to sync any repositories on it. + +## TransferScanner + +The TransferScanner thread needs to find keys that need to be Uploaded +to a remote, or Downloaded from it. + +How to find the keys to transfer? I'd like to avoid potentially +expensive traversals of the whole git working copy if I can. + +One way would be to do a git diff between the (unmerged) git-annex branches +of the git repo, and its remote. Parse that for lines that add a key to +either, and queue transfers. That should work fairly efficiently when the +remote is a git repository. Indeed, git-annex already does such a diff +when it's doing a union merge of data into the git-annex branch. It +might even be possible to have the union merge and scan use the same +git diff data. + +But that approach has several problems: + +1. The list of keys it would generate wouldn't have associated git + filenames, so the UI couldn't show the user what files were being + transferred. +2. Worse, without filenames, any later features to exclude + files/directories from being transferred wouldn't work. +3. Looking at a git diff of the git-annex branches would find keys + that were added to either side while the two repos were disconnected. + But if the two repos' keys were not fully in sync before they + disconnected (which is quite possible; transfers could be incomplete), + the diff would not show those older out of sync keys. + +The remote could also be a special remote. In this case, I have to either +traverse the git working copy, or perhaps traverse the whole git-annex +branch (which would have the same problems with filesnames not being +available). + +If a traversal is done, should check all remotes, not just +one. Probably worth handling the case where a remote is connected +while in the middle of such a scan, so part of the scan needs to be +redone to check it. ## longer-term TODO -* Test MountWatcher on Gnome (should work ok) and LXDE (dunno). +* Test MountWatcher on LXDE. * git-annex needs a simple speed control knob, which can be plumbed through to, at least, rsync. A good job for an hour in an airport somewhere. diff --git a/doc/design/assistant/syncing/comment_1_c70156174ff19b503978d623bd2df36f._comment b/doc/design/assistant/syncing/comment_1_c70156174ff19b503978d623bd2df36f._comment new file mode 100644 index 000000000..019490e61 --- /dev/null +++ b/doc/design/assistant/syncing/comment_1_c70156174ff19b503978d623bd2df36f._comment @@ -0,0 +1,19 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawk4YX0PWICfWGRLuncCPufMPDctT7KAYJA" + nickname="betabrain" + subject="selective data syncing" + date="2012-07-24T15:27:08Z" + content=""" +How will the assistant know which files' data to distribute between the repos? + +I'm using git-annex and it's numcopies attribute to maintain a redundant archive spread over different computers and usb drives. Not all drives should get a copy of everything, e.g. the usb drive I take to work should not automatically get a copy of family pictures. + +How about .gitattributes? + +* \* annex.auto-sync-data = false # don't automatically sync the data +* archive/ annex.auto-push-repos = NAS # everything added to archive/ in any repo goes automatically to the NAS remote. +* work/ annex.auto-synced-repos = LAPTOP WORKUSB # everything added to work/ in LAPTOP or WORKUSB gets synced to WORKUSB and LAPTOP +* work/ annex.auto-push-repos = LAPTOP WORKUSB # stuff added to work/ anywhere gets synced to LAPTOP and WORKUSB +* important/ annex.auto-sync-data = true # push data to all repos +* webserver_logs/ annex.remote.WEBSERVER.auto-push-repos = S3 # only the assistant running in WEBSERVER pushes webserver_logs/ to S3 remote +"""]] |