doc/design/git-remote-daemon.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195

# goals

* be configured like a regular git remote, with an unusual url
  or other configuration
* receive notifications when a remote has received new commits,
  and take some action
* optionally, do receive-pack and send-pack to a remote that
  is only accessible over an arbitrary network transport
  (like assistant does with XMPP)
* optionally, send/receive git-annex objects to remote
  over an arbitrary network transport

# difficulties

* authentication & configuration
* multiple nodes may be accessible over a single network transport,
  with it desirable to sync with any/all of them. For example, with
  XMPP, there can be multiple friends synced with. This means that
  one git remote can map to multiple remote nodes. Specific to git-annex,
  this means that a set of UUIDs known to be associated with the remote
  needs to be maintained, while currently each remote can only have one
  annex-uuid in .git/config.

# payoffs

* support [[assistant/telehash]]!
* Allow running against a normal ssh git remote. This would run
  git-annex-shell on the remote, watching for changes, and so be able to
  notify when a commit was pushed to the remote repo. This would let the
  assistant immediately notice and pull. So the assistant would be fully
  usable with a single ssh remote and no other configuration!
  **do this first**
* clean up existing XMPP support, make it not a special case, and not
  tightly tied to the assistant
* git-remote-daemon could be used independantly of git-annex,
  in any git repository.

# design

Let git-remote-daemon be the name. Or for git-annex,
`git annex remotedaemon`.

It runs in one of two ways:

1. Forked to background, using a named pipe for the control protocol.
2. With --foreground, the control protocol goes over stdio.

Either way, behavior is the same:

* Get a list of remotes to act on by looking at .git/config
* Automatically notices when a remote has changes to branches
  matching remote.$name.fetch, and pulls them down to the appropriate
  location.
* When the control protocol informs it about a new ref that's available,
  it offers the ref to any interested remotes.

# control protocol

This is an asynchronous protocol. Ie, either side can send any message
at any time, and the other side does not send a reply.

It is line based and intended to be low volume and not used for large data.

TODO: Expand with commands for sending/receiving git-annex objects, and
progress during transfer.

TODO: Will probably need to add something for whatever pairing is done by
the webapp.

## emitted messages

* `CONNECTED uri`

  Sent when a connection has been made with a remote.

* `DISCONNECTED uri`

  Sent when connection with a remote has been lost.

* `SYNCING uri`

  Indicates that a pull or a push with a remote is in progress.
  Always followed by DONESYNCING.

* `DONESYNCING uri 1|0`

  Indicates that syncing with a remote is done, and either succeeded
  (1) or failed (0).

* `WARNING uri string`

  A message to display to the user about a remote.

## consumed messages

* `PAUSE`

  The user has requested a pause.  
  git-remote-daemon should close connections and idle.

* `LOSTNET`

  The network connection has been lost.  
  git-remote-daemon should close connections and idle.

* `RESUME`

  Undoes PAUSE or DISCONNECTED.  
  Start back up network connections.

* `CHANGED ref ...`

  Indicates that a ref is new or has changed. These can be offered to peers,
  and peers that are interested in them can pull the content.

* `RELOAD`

  Indicates that configs have changed. Daemon should reload .git/config
  and/or restart.

  Possible config changes include adding a new remote, removing a remote,
  or setting `remote.<name>.annex-sync` to configure whether to sync with a
  particular remote.

* `STOP`

  Shut down git-remote-daemon

  (When using stdio, it also should shutdown when it reaches EOF on 
  stdin.)

# encryption & authentication

For simplicity, the network transports have to do their own end-to-end
encryption. Encryption is not part of this design.

(XMPP does not do end-to-end encryption, but might be supported
transitionally.)

Ditto for authentication that we're talking to who we indend to talk to.
Any public key data etc used for authenticion is part of the remote's
configuration (or hidden away in a secure chmodded file, if neccesary).
This design does not concern itself with authenticating the remote node,
it just takes the auth token and uses it.

For example, in telehash, each node has its own keypair, which is used
or authentication and encryption, and is all that's needed to route
messages to that node.

# network level protocol

How do peers communicate with one another over the network?

This seems to need to be network-layer dependant. Telehash will need
one design, and git-annex-shell on a central ssh server has a very different
(and much simpler) design.

## ssh

`git-annex-shell notifychanges` is run, and speaks a simple protocol
over stdio to inform when refs on the remote have changed.

No pushing is done for CHANGED, since git handles ssh natively.

TODO:

* For wicd, the NetWatcher does not detect network loss, only network gain.
  So PAUSE is only sent when a new network is detected, followed
  immediately by RESUME. This was already fixed for networkmanager.
* Remote system might not be available. Find a smart way to detect it,
  ideally w/o generating network traffic. One way might be to check
  if the ssh connection caching control socket exists, for example.
* Now that ssh connection caching is enabled for git push/pull in sync,
  there's the possibility that a stale ssh connection may linger when 
  changing network connections, and so attempts to use it will stall.  
  (This was already a potential issue with transfers, which already
  used the caching.)  

  One option is ssh's ServerAliveCountMax, which will make a dead
  ssh connection disconnect after approx 45 seconds, per ssh manual.
  It would need to be enabled by setting ServerAliveInterval=15.
  And this would add network traffic..

  Another option is to disable all cached connections when the network
  connection changes. This would handle *most* cases. The case
  not handled is eg, my dialup ppp box getting a new public IP address,
  which my laptop won't notice. **done**

## telehash 

TODO

## xmpp

Reuse [[assistant/xmpp]]