summaryrefslogtreecommitdiff
path: root/doc/design/external_special_remote_protocol.mdwn
blob: 19640fb7fe4b2ae007a0c5ab721f3e4be65aed1c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
See [[todo/support_for_writing_external_special_remotes]] for motivation.

This is a design for a protocol to be used to communicate between git-annex
and a program implementing an external special remote.

The external special remote program has a name like
`git-annex-remote-$bar`. When `git annex initremote foo type=$bar` is run,
git-annex finds the appropriate program in PATH.

The program is started by git-annex when it needs to access the special
remote, and may be left running for a long period of time. This allows
it to perform expensive setup tasks, etc. Note that git-annex may choose to
start multiple instances of the program (eg, when multiple git-annex
commands are run concurrently in a repository).

## protocol overview

Communication is via stdin and stdout. Therefore, the external special
remote must avoid doing any prompting, or outputting anything like eg,
progress to stdout. (Such stuff can be sent to stderr instead.)

The protocol is line based. Messages are sent in either direction, from
git-annex to the special remote, and from the special remote to git-annex.

In order to avoid confusing interactions, one or the other has control
at any given time, and is responsible for sending requests, while the other
only sends replies to the requests.

## example session

The special remote is responsible for sending the first message, indicating
the version of the protocol it is using.

	VERSION 0

Once it knows the version, git-annex will send a message telling the
special remote to start up.

	PREPARE

The special remote can now ask git-annex for its configuration, as needed,
and check that it's valid. git-annex responds with the configuration values

	GETCONFIG directory
	VALUE /media/usbdrive/repo
	GETCONFIG automount
	VALUE true

Once the special remote is satisfied with its configuration and is
ready to go, it tells git-annex.

	PREPARE-SUCCESS

Now git-annex will tell the special remote what to do. Let's suppose
it wants to store a key. 

	TRANSFER STORE somekey tmpfile

The special remote can continue sending messages to git-annex during this
transfer. It will typically send progress messages, indicating how many
bytes have been sent:

	PROGRESS STORE somekey 10240
	PROGRESS STORE somekey 20480

Once the key has been stored, the special remote tells git-annex the result:

	TRANSFER-SUCCESS STORE somekey

Once git-annex is done with the special remote, it will close its stdin.
The special remote program can then exit.

## git-annex request messages

These are the request messages git-annex may send to the special remote
program. None of these messages require an immediate reply. The special
remote can send any messages it likes while handling the requests.

Once the special remote has finished performing the request, it should
send one of the corresponding replies listed in the next section.

* `PREPARE`  
  Tells the special remote it's time to prepare itself to be used.
  Only run once, at startup, always immediately after the special remote
  sends VERSION.
* `INITREMOTE`  
  Request that the remote initialized itself. This is where any one-time
  setup tasks can be done, for example creating an Amazon S3 bucket.  
  (PREPARE is still sent before this.)  
  Note: This may be run repeatedly, as a remote is initialized in
  different repositories, or as the configuration of a remote is changed.
  So any one-time setup tasks should be done idempotently.
* `GETCOST`  
  Requests the remote return a use cost. Higher costs are more expensive.
  (See Config/Cost.hs for some standard costs.)
* `TRANSFER STORE|RETRIEVE Key File`  
  Requests the transfer of a key. For Send, the File is the file to upload;
  for Receive the File is where to store the download. Note that the File
  should not influence the filename used on the remote. The filename used
  should be derived from the Key.  
  Multiple transfers might be requested by git-annex, but it's fine for the 
  program to serialize them and only do one at a time.
* `CHECKPRESENT Key`  
  Requests the remote check if a key is present in it.
* `REMOVE Key`  
  Requests the remote remove a key's contents.

## special remote replies

These should be sent only in response to the git-annex request messages.
They do not have to be sent immediately after the request; the special
remote can send its own requests (listed in the next section below)
while it's handling a request.

* `PREPARE-SUCCESS`  
  Sent as a response to PREPARE once the special remote is ready for use.
* `TRANSFER-SUCCESS STORE|RETRIEVE Key`  
  Indicates the transfer completed successfully.
* `TRANSFER-FAILURE STORE|RETRIEVE Key ErrorMsg`  
  Indicates the transfer failed.
* `CHECKPRESENT-SUCCESS Key`  
  Indicates that a key has been positively verified to be present in the
  remote.
* `CHECKPRESENT-FAILURE Key`  
  Indicates that a key has been positively verified to not be present in the
  remote.
* `CHECKPRESENT-UNKNOWN Key ErrorMsg`  
  Indicates that it is not currently possible to verify if the key is
  present in the remote. (Perhaps the remote cannot be contacted.)
* `REMOVE-SUCCESS Key`  
  Indicates the key has been removed from the remote. May be returned if
  the remote didn't have the key at the point removal was requested.
* `REMOVE-FAILURE Key`  
  Indicates that the key was unable to be removed from the remote.
* `COST Int`  
  Indicates the cost of the remote.
* `COST-UNKNOWN`  
  Indicates the remote has no opinion of its cost.
* `INITREMOTE-SUCCESS Setting=Value ...`  
  Indicates the INITREMOTE succeeded and the remote is ready to use.
  The settings and values can optionally be returned. They will be added
  to the existing configuration of the remote (and may change existing
  values in it).
* `INITREMOTE-FAILURE ErrorMsg`  
  Indicates that INITREMOTE failed.

## special remote messages

These messages may be sent by the special remote at any time that it's
in control.

* `VERSION Int`  
  Supported protocol version. Current version is 0. Must be sent first
  thing at startup, as until it sees this git-annex does not know how to
  talk with the special remote program!
* `PROGRESS STORE|RETRIEVE Key Int`  
  Indicates the current progress of the transfer. May be repeated any
  number of times during the transfer process. This is highly recommended
  for STORE. (It is optional but good for RETRIEVE.)  
  (git-annex does not send a reply to this message.)
* `DIRHASH Key`  
  Gets a two level hash associated with a Key. Something like "abc/def".
  This is always the same for any given Key, so can be used for eg,
  creating hash directory structures to store Keys in.  
  (git-annex replies with VALUE followed by the value.)
* `GETCONFIG Setting`
  Gets one of the special remote's configuration settings.  
  (git-annex replies with VALUE followed by the value.)
* `SETSTATE Key Value`  
  git-annex can store state in the git-annex branch on a
  per-special-remote, per-key basis. This sets that state.  
  (git-annex replies with VALUE followed by the value stored.)
* `GETSTATE Key`  
  Gets any state previously stored for the key from the git-annex branch.  
  Note that some special remotes may be accessed from multiple
  repositories, and the state is only eventually consistently synced
  between them. If two repositories set different values in the state
  for a key, the one that sets it last wins.  
  (git-annex replies with VALUE followed by the value.)

## general messages

These messages can be sent at any time by either git-annex or the special
remote.

* `ERROR ErrorMsg`  
  Generic error. Can be sent at any time if things get messed up.
  When possible, use a more specific reply from the list above.  
  It would be a good idea to send this if git-annex sends a command
  you do not support. The program should exit after sending this, as
  git-annex will not talk to it any further. If the program receives
  an ERROR, it can try to recover, or exit with its own ERROR.

## Simple shell example

[[!format sh """
#!/bin/sh
set -e

echo VERSION 0

while read line; do
	set -- $line
	case "$1" in
		INITREMOTE)
			# XXX do anything necessary to create resources
			# used by the remote. Try to be idempotent.
			# Use GETCONFIG to get any needed configuration
			# settings.
			echo INITREMOTE-SUCCESS
		;;
		GETCOST)
			echo COST-UNKNOWN
		;;
		PREPARE)
			# XXX Use GETCONFIG to get configuration settings,
			# and do anything needed to start using the
			# special remote here.
			echo PREPARE-SUCCESS
		;;
		TRANSFER)
			key="$3"
			file="$4"
			case "$2" in
				STORE)
					# XXX upload file here
					# XXX when possible, send PROGRESS
					echo TRANSFER-SUCCESS STORE "$key"
				;;
				RETRIEVE)
					# XXX download file here
					echo TRANSFER-SUCCESS RETRIEVE "$key"
				;;
				
			esac
		;;
		CHECKPRESENT)
			key="$2"
			echo CHECKPRESENT-UNKNOWN "$key" "not implemented"
		;;
		REMOVE)
			key="$2"
			# XXX remove key here
			echo REMOVE-SUCCESS "$key"
		;;
		*)
			echo ERROR "unknown command received: $line"
			exit 1
		;;
	esac	
done

# XXX anything that needs to be done at shutdown can be done here
"""]]

## TODO

* Communicate when the network connection may have changed, so long-running
  remotes can reconnect.
* uuid discovery during initremote.
* Support for splitting files into chunks.