summaryrefslogtreecommitdiff
path: root/doc/bugs/external_special_remote_protocol_broken_by_key_with_spaces/comment_1_c9095cf16c9b044a4eae6b9b445b2512._comment
blob: 611508b6bdbb93f13a93286eca408c689770231e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
[[!comment format=mdwn
 username="joey"
 subject="""comment 1"""
 date="2017-08-15T18:45:21Z"
 content="""
It would be kind of nice to not have to worry about keys containing spaces.
(Only space is allowed currently, not other whitespace.) But that ship has
long sailed and it would be very ugly to force changing repositories using
WORM keys with spaces to WORM keys without spaces. About as ugly as
entirely dropping WORM, which might not be a bad long-term goal.

Here's a few approaches for fixing this in the external special remote protocol:

1. Mangle keys containing spaces as they are sent via the protocol
   and de-mangle as they're received.
   Eg, convert ' ' to 's' and convert 's' to 'ss' etc.  
   (Perhaps pick a better letter than 's' -- '%' and ',' and '&' are
   already used for other escaping.)  
2. Make "VERSION 2" be the same as the current protocol, except Key
   (and other parameters) are quoted in some way.
3. Make "VERSION 2" be the same as the current protocol, except using NUL
   rather than space to separate parameters. Perhaps also to separate
   protocol commands from their parameters, but then "VERSION 2" would
   be an exception to the rule.

Of course the advantage of #1 is that it doesn't need any modifications to all
the external special remotes. Such modifications would take quite a while,
and probably wouldn't even start until a git-annex supporting "VERSION 2"
was widely available, ie years even to start.

Using NUL is conceptually the simplest implementation, but it may cause
issues for implementing external special remotes in some languages. Shell
scripts are probably not good at NULs, for example.

One problem with approach #1 is, what if a key containing a space in the
protocol happened to already work with the protocol parser in an external
special remote? This seems at least possible. And so content would already
be stored on the external special remote under the real key, not the
mangled key. When git-annex starts mangling the key, retreiving the content
from the special remote would then fail.

For that to happen, the external special remote would need a parser that
when it sees "TRANSFER STORE Key File", parses it with File as the last
word, so supporting multi-word Key. The documentation does currently say
that "The filename will not contain any whitespace." Actually, I tested it,
and with a WORM key with spaces, that documentation is generally wrong,
since the File is based on the Key, it also contains whitespace, most of
the time. One exception is when using direct mode, if the work tree file
has been renamed to not contain whitespace.

So, approach #1 seems to call for auditing of external special remotes
and/or a warning to their implementors about that issue. This would also
be an opportunity to correct the documentation about spaces in the
filename, and make sure that they all support that.

Another problem with key mangling is if an external special remote takes
the mangled key, and passes it to some other git-annex command, like `git annex
setpresentkey` or something. I seem to remember datalad doing this kind of
thing.
"""]]