blog for the day and design update

author: Joey Hess <joey@kitenet.net> 2012-06-21 20:02:00 -0400
committer: Joey Hess <joey@kitenet.net> 2012-06-21 20:02:00 -0400
commit: f27da7a1cc095dcaf9ce0cc2170fe98d3b050336 (patch)
tree: 570804543c4b8f54cb76cec2313ebe8989b14a68
parent: 8a0d6d83f4e241f0cc18269e62e7289fec060e4e (diff)
2 files changed, 55 insertions, 3 deletions
diff --git a/doc/design/assistant/blog/day_14__thinking_about_syncing.mdwn b/doc/design/assistant/blog/day_14__thinking_about_syncing.mdwn
new file mode 100644
index 000000000..c4a700d13
--- /dev/null
+++ b/doc/design/assistant/blog/day_14__thinking_about_syncing.mdwn
@@ -0,0 +1,44 @@
+Pondering [[syncing]] today. I will be doing syncing of the git repository
+first, and working on syncing of file data later.
+
+The former seems straightforward enough, since we just want to push all
+changes to everywhere. Indeed, git-annex already has a [[sync]] command
+that uses a smart technique to allow syncing between clones without a
+central bare repository. (Props to Joachim Breitner for that.)
+
+But it's not all easy. Syncing should happen as fast as possible, so
+changes show up without delay. Eventually it'll need to support syncing
+between nodes that cannot directly contact one-another. Syncing needs to
+deal with nodes coming and going; one example of that is a USB drive being
+plugged in, which should immediatly be synced, but network can also come
+and go, so it should periodically retry nodes it failed to sync with. To
+start with, I'll be focusing on fast syncing between directly connected
+nodes, but I have to keep this wider problem space in mind.
+
+One problem with `git annex sync` is that it has to be run in both clones
+in order for changes to fully propigate. This is because git doesn't allow
+pushing changes into a non-bare repository; so instead it drops off a new
+branch in `.git/refs/remotes/$foo/synced/master`. Then when it's run locally
+it merges that new branch into `master`. 
+
+So, how to trigger a clone to run `git annex sync` when syncing to it?
+Well, I just realized I have spent two weeks developing something that can
+be repurposed to do that! [[Inotify]] can watch for changes to
+`.git/refs/remotes`, and the instant a change is made, the local sync
+process can be started. This avoids needing to make another ssh connection
+to trigger the sync, so is faster and allows the data to be transferred
+over another protocol than ssh, which may come in handy later.
+
+So, in summary, here's what will happen when a new file is created:
+
+1. inotify event causes the file to be added to the annex, and
+   immediately committed.
+2. new branch is pushed to remotes (probably in parallel)
+3. remotes notice new sync branch and merge it
+4. (data sync, TBD later)
+5. file is fully synced and available
+
+Steps 1, 2, and 3 should all be able to be accomplished in under a second.
+The speed of `git push` making a ssh connection will be the main limit
+to making it fast. (Perhaps I should also reuse git-annex's existing ssh
+connection caching code?)
diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn
index 0813b8b70..56c9692e3 100644
--- a/doc/design/assistant/syncing.mdwn
+++ b/doc/design/assistant/syncing.mdwn
@@ -3,13 +3,21 @@ all the other git clones, at both the git level and the key/value level.
 
 ## git syncing
 
-1. At regular intervals, just run `git annex sync`, which already handles
-   bidirectional syncing.
+1. Can use `git annex sync`, which already handles bidirectional syncing.
+   When a change is committed, launch the part of `git annex sync` that pushes
+   out changes.
+1. Watch `.git/refs/remotes/` for changes (which would be pushed in from
+   another node via `git annex sync`), and run the part of `git annex sync`
+   that merges in received changes, and follow it by the part that pushes out
+   changes (sending them to any other remotes).
+   [The watching can be done with the existing inotify code! This avoids needing
+   any special mechanism to notify a remote that it's been synced to.]
 2. Use a git merge driver that adds both conflicting files,
    so conflicts never break a sync.
 3. Investigate the XMPP approach like dvcs-autosync does, or other ways of
    signaling a change out of band.
-4. Add a hook, so when there's a change to sync, a program can be run.
+4. Add a hook, so when there's a change to sync, a program can be run
+   and do its own signaling.
 
 ## data syncing
author	Joey Hess <joey@kitenet.net>	2012-06-21 20:02:00 -0400
committer	Joey Hess <joey@kitenet.net>	2012-06-21 20:02:00 -0400
commit	f27da7a1cc095dcaf9ce0cc2170fe98d3b050336 (patch)
tree	570804543c4b8f54cb76cec2313ebe8989b14a68
parent	8a0d6d83f4e241f0cc18269e62e7289fec060e4e (diff)