summaryrefslogtreecommitdiff
path: root/doc/design/external_special_remote_protocol.mdwn
blob: 8cb7ed3b65117df22e0b5b82dbdd6fe605f26d28 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
Communication between git-annex and a program implementing an external
special remote uses this protocol.

[[!toc]]

## starting the program

The external special remote program has a name like
`git-annex-remote-$bar`. When 
`git annex initremote foo type=external externaltype=$bar` is run,
git-annex finds the appropriate program in PATH.

The program is started by git-annex when it needs to access the special
remote, and may be left running for a long period of time. This allows
it to perform expensive setup tasks, etc. Note that git-annex may choose to
start multiple instances of the program (eg, when multiple git-annex
commands are run concurrently in a repository).

## protocol overview

Communication is via stdin and stdout. Therefore, the external special
remote must avoid doing any prompting, or outputting anything like eg,
progress to stdout. (Such stuff can be sent to stderr instead.)

The protocol is line based. Messages are sent in either direction, from
git-annex to the special remote, and from the special remote to git-annex.

In order to avoid confusing interactions, one or the other has control
at any given time, and is responsible for sending requests, while the other
only sends replies to the requests.

Each protocol line starts with a command, which is followed by the
command's parameters (a fixed number per command), each separated by a
single space. The last parameter may contain spaces. Parameters may be
empty, but the separating spaces are still required in that case.

## example session

The special remote is responsible for sending the first message, indicating
the version of the protocol it is using.

	VERSION 1

Once it knows the version, git-annex will generally 
send a message telling the special remote to start up.
(Or it might send a INITREMOTE, so don't hardcode this order.)

	PREPARE

The special remote can now ask git-annex for its configuration, as needed,
and check that it's valid. git-annex responds with the configuration values

	GETCONFIG directory
	VALUE /media/usbdrive/repo
	GETCONFIG automount
	VALUE true

Once the special remote is satisfied with its configuration and is
ready to go, it tells git-annex that it's done with the PREPARE step:

	PREPARE-SUCCESS

Now git-annex will make a request. Let's suppose it wants to store a key. 

	TRANSFER STORE somekey tmpfile

The special remote can then start reading the tmpfile and storing it.
While it's doing that, the special remote can send messages back to 
git-annex to indicate what it's doing, or ask for other information.
It will typically send progress messages, indicating how many
bytes have been sent:

	PROGRESS 10240
	PROGRESS 20480

Once the key has been stored, the special remote tells git-annex the result:

	TRANSFER-SUCCESS STORE somekey

Now git-annex will send its next request.

Once git-annex is done with the special remote, it will close its stdin.
The special remote program can then exit.

## git-annex request messages

These are messages git-annex sends to the special remote program.
None of these messages require an immediate reply. The special
remote can send any messages it likes while handling the requests.

Once the special remote has finished performing the request, it should
send one of the corresponding replies listed in the next section.

The following requests *must* all be supported by the special remote.

* `INITREMOTE`  
  Requests the remote to initialize itself. This is where any one-time
  setup tasks can be done, for example creating an Amazon S3 bucket.  
  Note: This may be run repeatedly over time, as a remote is initialized in
  different repositories, or as the configuration of a remote is changed.
  (Both `git annex initremote` and `git-annex enableremote` run this.)
  So any one-time setup tasks should be done idempotently.
* `PREPARE`  
  Tells the remote that it's time to prepare itself to be used.  
  Only INITREMOTE can come before this.
* `TRANSFER STORE|RETRIEVE Key File`  
  Requests the transfer of a key. For STORE, the File is the file to upload;
  for RETRIEVE the File is where to store the download.  
  Note that the File should not influence the filename used on the remote.
  The filename will not contain any whitespace.  
  Note that it's important that, while a Key is being stored, CHECKPRESENT
  not indicate it's present until all the data has been transferred.  
* `CHECKPRESENT Key`  
  Requests the remote to check if a key is present in it.
* `REMOVE Key`  
  Requests the remote to remove a key's contents.

The following requests can optionally be supported. If not handled,
replying with `UNSUPPORTED-REQUEST` is acceptable.

* `GETCOST`  
  Requests the remote to return a use cost. Higher costs are more expensive.
  (See Config/Cost.hs for some standard costs.)
* `GETAVAILABILITY`
  Requests the remote to send back an `AVAILABILITY` reply.
  If the remote replies with `UNSUPPORTED-REQUEST`, its availability
  is assumed to be global. So, only remotes that are only reachable
  locally need to worry about implementing this.
* `CLAIMURL Url`  
  Asks the remote if it wishes to claim responsibility for downloading
  an url. If so, the remote should send back an `CLAIMURL-SUCCESS` reply.
  If not, it can send `CLAIMURL-FAILURE`.
* `CHECKURL Url`  
  Asks the remote to check if the url's content can currently be downloaded
  (without downloading it). The remote replies with one of `CHECKURL-FAILURE`,
  `CHECKURL-CONTENTS`, or `CHECKURL-MULTI`.
* `WHEREIS Key`
  Asks the remote to provide additional information about ways to access
  the content of a key stored in it, such as eg, public urls.
  This will be displayed to the user by eg, `git annex whereis`. The remote
  replies with `WHEREIS-SUCCESS` or `WHEREIS-FAILURE`.  
  Note that users expect `git annex whereis` to run fast, without eg,
  network access.  
  This is not needed when `SETURIPRESENT` is used, since such uris are
  automatically displayed by `git annex whereis`.  

More optional requests may be added, without changing the protocol version,
so if an unknown request is seen, reply with `UNSUPPORTED-REQUEST`.

## special remote replies

These should be sent only in response to the git-annex request messages.
They do not have to be sent immediately after the request; the special
remote can send its own messages (listed in the next section below)
while it's handling a request.

* `PREPARE-SUCCESS`  
  Sent as a response to PREPARE once the special remote is ready for use.
* `PREPARE-FAILURE ErrorMsg`
  Sent as a response to PREPARE if the special remote cannot be used.
* `TRANSFER-SUCCESS STORE|RETRIEVE Key`  
  Indicates the transfer completed successfully.
* `TRANSFER-FAILURE STORE|RETRIEVE Key ErrorMsg`  
  Indicates the transfer failed.
* `CHECKPRESENT-SUCCESS Key`  
  Indicates that a key has been positively verified to be present in the
  remote.
* `CHECKPRESENT-FAILURE Key`  
  Indicates that a key has been positively verified to not be present in the
  remote.
* `CHECKPRESENT-UNKNOWN Key ErrorMsg`  
  Indicates that it is not currently possible to verify if the key is
  present in the remote. (Perhaps the remote cannot be contacted.)
* `REMOVE-SUCCESS Key`  
  Indicates the key has been removed from the remote. May be returned if
  the remote didn't have the key at the point removal was requested.
* `REMOVE-FAILURE Key ErrorMsg`  
  Indicates that the key was unable to be removed from the remote.
* `COST Int`  
  Indicates the cost of the remote.
* `AVAILABILITY GLOBAL|LOCAL`
  Indicates if the remote is globally or only locally available.
  (Ie stored in the cloud vs on a local disk.)
* `INITREMOTE-SUCCESS`  
  Indicates the INITREMOTE succeeded and the remote is ready to use.
* `INITREMOTE-FAILURE ErrorMsg`  
  Indicates that INITREMOTE failed.
* `CLAIMURL-SUCCESS`  
  Indicates that the CLAIMURL url will be handled by this remote.
* `CLAIMURL-FAILURE`  
  Indicates that the CLAIMURL url wil not be handled by this remote.
* `CHECKURL-CONTENTS Size|UNKNOWN Filename`  
  Indicates that the requested url has been verified to exist.  
  The Size is the size in bytes, or use "UNKNOWN" if the size could not be
  determined.  
  The Filename can be empty (in which case a default is used),
  or can specify a filename that is suggested to be used for this url.
* `CHECKURL-MULTI Url1 Size1|UNKNOWN Filename1 Url2 Size2|UNKNOWN Filename2 ...`  
  Indicates that the requested url has been verified to exist,
  and contains multiple files, which can each be accessed using
  their own url.  Each triplet of url, size, and filename should be listed,
  one after the other.
  Note that since a list is returned, neither the Url nor the Filename
  can contain spaces.
* `CHECKURL-FAILURE`  
  Indicates that the requested url could not be accessed.
* `WHEREIS-SUCCESS String`  
  Indicates a location of a key. Typically an url, the string can
  be anything that it makes sense to display to the user about content
  stored in the special remote.
* `WHEREIS-FAILURE`  
  Indicates that no location is known for a key.
* `UNSUPPORTED-REQUEST`  
  Indicates that the special remote does not know how to handle a request.

## special remote messages

These messages may be sent by the special remote at any time that it's
handling a request.

* `VERSION Int`  
  Supported protocol version. Current version is 1. Must be sent first
  thing at startup, as until it sees this git-annex does not know how to
  talk with the special remote program!  
  (git-annex does not send a reply to this message, but may give up if it
  doesn't support the necessary protocol version.)
* `PROGRESS Int`  
  Indicates the current progress of the transfer (in bytes). May be repeated
  any number of times during the transfer process, but it's wasteful to
  update the progress until at least another 1% of the file has been sent.
  This is highly recommended for STORE. (It is optional but good for RETRIEVE.)  
  (git-annex does not send a reply to this message.)
* `DIRHASH Key`  
  Gets a two level hash associated with a Key. Something like "aB/Cd".
  This is always the same for any given Key, so can be used for eg,
  creating hash directory structures to store Keys in. This is the same
  directory hash that git-annex uses inside `.git/annex/objects/`  
  (git-annex replies with VALUE followed by the value.)
* `DIRHASH-LOWER Key`  
  Gets a two level hash associated with a Key, using only lower-case.
  Something like "abc/def".
  This is always the same for any given Key, so can be used for eg,
  creating hash directory structures to store Keys in. This is the same
  directory hash that is used by eg, the directory special remote.  
  (git-annex replies with VALUE followed by the value.)  
  (First supported by git-annex version 6.20160511.)
* `SETCONFIG Setting Value`  
  Sets one of the special remote's configuration settings.  
  Normally this is sent during INITREMOTE, which allows these settings
  to be stored in the git-annex branch, so will be available if the same
  special remote is used elsewhere. (If sent after INITREMOTE, the changed
  configuration will only be available while the remote is running.)  
  (git-annex does not send a reply to this message.)
* `GETCONFIG Setting`  
  Gets one of the special remote's configuration settings, which can have
  been passed by the user when running `git annex initremote`, or
  can have been set by a previous SETCONFIG. Can be run at any time.  
  (git-annex replies with VALUE followed by the value. If the setting is
  not set, the value will be empty.)
* `SETCREDS Setting User Password`  
  When some form of user and password is needed to access a special remote,
  this can be used to securely store them for later use.
  (Like SETCONFIG, this is normally sent only during INITREMOTE.)  
  The Setting indicates which value in a remote's configuration can be
  used to store the creds.  
  Note that creds are normally only stored in the remote's configuration
  when it's surely safe to do so; when gpg encryption is used, in which
  case the creds will be encrypted using it. If creds are not stored in
  the configuration, they'll only be stored in a local file.  
  (embedcreds can be set to yes by the user or by SETCONFIG to force
   the creds to be stored in the remote's configuration).  
  (git-annex does not send a reply to this message.)
* `GETCREDS Setting`  
  Gets any creds that were previously stored in the remote's configuration
  or a file.
  (git-annex replies with "CREDS User Password". If no creds are found,
  User and Password are both empty.)
* `GETUUID`  
  Queries for the UUID of the special remote being used.  
  (git-annex replies with VALUE followed by the UUID.)
* `GETGITDIR`  
  Queries for the path to the git directory of the repository that
  is using the external special remote.
  (git-annex replies with VALUE followed by the path.)
* `SETWANTED PreferredContentExpression`  
  Can be used to set the preferred content of a repository. Normally
  this is not configured by a special remote, but it may make sense
  in some situations to hint at the kind of content that should be stored
  in the special remote. Note that if a unparsable expression is set,
  git-annex will ignore it.  
  (git-annex does not send a reply to this message.)
* `GETWANTED`  
  Gets the current preferred content setting of the repository.
  (git-annex replies with VALUE followed by the preferred content
  expression.)
* `SETSTATE Key Value`  
  Can be used to store some form of state for a Key. The state stored
  can be anything this remote needs to store, in any format.
  It is stored in the git-annex branch. Note that this means that if
  multiple repositories are using the same special remote, and store
  different state, whichever one stored the state last will win. Also,
  it's best to avoid storing much state, since this will bloat the
  git-annex branch. Most remotes will not need to store any state.  
  (git-annex does not send a reply to this message.)
* `GETSTATE Key`  
  Gets any state that has been stored for the key.  
  (git-annex replies with VALUE followed by the state.)
* `SETURLPRESENT Key Url`  
  Records an URL where the Key can be downloaded from.  
  Note that this does not make git-annex think that the url is present on
  the web special remote.  
  Keep in mind that this stores the url in the git-annex branch. This can
  result in bloat to the branch if the url is large and/or does not delta
  pack well with other information (such as the names of keys) already
  stored in the branch.  
  (git-annex does not send a reply to this message.)
* `SETURLMISSING Key Url`  
  Records that the key can no longer be downloaded from the specified
  URL.  
  (git-annex does not send a reply to this message.)
* `SETURIPRESENT Key Uri`  
  Records an URI where the Key can be downloaded from.  
  For example, "ipfs:ADDRESS" is used for the ipfs special remote;
  its CLAIMURL handler checks for such URIS and claims them.  
  (git-annex does not send a reply to this message.)
* `SETURIMISSING Key Uri`  
  Records that the key can no longer be downloaded from the specified
  URI.  
  (git-annex does not send a reply to this message.)
* `GETURLS Key Prefix`  
  Gets the recorded urls where a Key can be downloaded from.
  Only urls that start with the Prefix will be returned. The Prefix
  may be empty to get all urls.
  (git-annex replies one or more times with VALUE for each url.
  The final VALUE has an empty value, indicating the end of the url list.)
* `DEBUG message`
  Tells git-annex to display the message if --debug is enabled.
  (git-annex does not send a reply to this message.)

## general messages

These messages can be sent at any time by either git-annex or the special
remote.

* `ERROR ErrorMsg`  
  Generic error. Can be sent at any time if things get too messed up
  to continue. When possible, use a more specific reply from the list above.  
  The special remote program should exit after sending this, as
  git-annex will not talk to it any further. If the program receives
  an ERROR from git-annex, it can exit with its own ERROR.

## long running network connections

Since an external special remote is started only when git-annex needs to
access the remote, and then left running, it's ok to open a network
connection in the PREPARE stage, and continue to use that network
connection as requests are made.

If you're unable to open a network connection, or the connection closes,
perhaps because the network is down, it's ok to fail to perform any
requests. Or you can try to reconnect when a new request is made.

Note that the external special remote program may be left running for
quite a long time, especially when the git-annex assistant is using it.
The assistant will detect when the system connects to a network, and will
start a new process the next time it needs to use a remote.

## readonly mode

Some storage services allow downloading the content of a file using a
regular http connection, with no authentication. An external special remote
for such a storage service can support a readonly mode of operation.

It works like this:

* When a key's content is stored on the remote, use SETURLPRESENT to
  tell git-annex the public url from which it can be downloaded.
* When a key's content is removed from the remote, use SETURLMISSING.
* Document that this external special remote can be used in readonly mode.

  The user doesn't even need to install your external special remote
  program to use such a remote! All they need to do is run:
  `git annex enableremote $remotename readonly=true`

* The readonly=true parameter makes git-annex download content from the
  urls recorded earlier by SETURLPRESENT.

## TODO

* When storing encrypted files stream the file up/down the pipe, rather
  than using a temp file. Will probably involve less space and disk IO, 
  and makes the progress display better, since the encryption can happen
  concurrently with the transfer. Also, no need to use PROGRESS in this
  scenario, since git-annex can see how much data it has sent/received from
  the remote. However, \n and probably \0 need to be escaped somehow in the
  file data, which adds complication.
* uuid discovery during INITREMOTE.
* Hook into webapp. Needs a way to provide some kind of prompt to the user
  in the webapp, etc.

* When a new "special remote message" is added to this protocol, and a
  program wants to use it,  an old version of git-annex will reject the
  message as unknown, and fail to use the remote with a protocol error.

  The program can check `git-annex version`, but that's not very
  satisfactory. Version comparison can be hard and
  PATH might not point to the same git-annex that's running the program.

  One way to fix this would be to make git-annex reply to VERSION
  with a PROTOCOLKEYWORDS message listing all the keywords in the
  protocol that it knows.
  The program could then check if the new message it wants to send is on
  the list. PROTOCOLKEYWORDS would be ignored by any program that doesn't
  care/know about it; programs are required to send UNSUPPORTED-REQUEST.

  I worry that some special remote programs might expect to get only 
  PREPARE or INITREMOTE after VERSION, so this change would break them.
  I mean, they shouldn't.. But a quickly/badly written one might.
  Probably want to review all the linked external special remote programs
  before doing this.