doc/todo/sha1_collision_embedding_in_git-annex_keys.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145

Some git-annex backends allow embedding enough data in the names of keys
that it could be used for a SHA1 collision attack. So, a signed git commit
could point to a tree with such a key in it, and the blob for the key could
have two versions with the same SHA1.

> All issues below are [[done]] --[[Joey]]

Users who want to use git-annex with signed commits to mitigate git's own
SHA1 insecurities would like at least a way to disable the insecure
git-annex backends:

* WORM can contain fairly arbitrary data in a key name
* URL too (also, of course, URLs download arbitrary data from the web,
  so a signed git commit pointing at URL keys doesn't have any security
  even w/o SHA1 collisions)
* SHA1 and MD5 backends are insecure because there can be colliding
  versions of the data they point to.

There could be a config setting, which would prevent git-annex from using
keys with such insecure backends. A user who checks git commit signatures
could enable the config setting when they initially clone their repository.
This should prevent any file contents using insecure backends from being
downloaded into the repository. (Even git-annex-shell recvkey would
refuse data using such a key, since it would fail parsing the key.)
The user would thus know that any file contents in their repository match
the files in signed git commits.

Enabling the config setting in a repository that already contains
file contents would be a mistake, because it might contain insecure keys.
And since git-annex would skip over such files, `git annex fsck` cannot
warn about such a mistake.

Perhaps, then, the config setting should be turned on by `git annex init`?
Or, we can document this gotcha.

> I've done some groundwork for this, but making git-annex not accept
> insecure keys into the repo at all requires changing file2key,
> which is a pure function that's used in eg, instances for serialization.
> 
> So, how to make it vary depending on git config? Can't. Alternative
> would be to add lots of checks everywhere a key is read from disk
> or network, which feels like it would be a hard security boundary to
> manage.
> 
> It doesn't really matter if content under an insecure key is in the
> repo, as long as there's not a signed commit referencing such a key.
> So, we could say, this is up to the user constucting a signed commit, to not
> put such keys in the commit.
> 
> Or, we could use the pre-commit hook, and when 
> the config setting disallows insecure keys, make it reject commits
> that contain them. But, if a past commit added a file using an insecure
> key, and the current commit does not touch it, should it be rejected?
> Rejecting it would then require a somewhat expensive look at the tree
> being committed.
> 
> The user might be merging a branch from someone else; there seems no
> git hook that can sanity check a fast-forward merge.
> 
> Perhaps leave it up to the person making signed commits to get it 
> right, and make git annex fsck warn about such keys? That seems
> reasonable. --[[Joey]]

> > Rather than preventing SHA1/URL/WORM Keys, could put checks in
> > Annex.Content.moveAnnex to prevent SHA1/URL/WORM objects reaching the
> > repository. That would make moveAnnex a security boundary, which is is
> > not currently. Would need to audid to check if anything else populates
> > .git/annex/objects.
> >
> > Annex.Transfer.runTransfer could also check for disallowed objects,
> > not as a security boundary, but to prevent accidental expensive
> > transfers that would fail at the moveAnnex stage.

> > As to how to enable this, it may make sense to use git-annex-config
> > but only read the value from the git-annex branch when initializing the
> > repository, and cache it in git-config.
> > 
> > This way, a repository can be created and configured not to allow
> > SHA1/URL/WORM, and all clones will inherit this configuration. 
> >
> > Users can also set it in git-config on a per repository basis.
> > 
> > If the git-annex-config setting is changed, existing clone's won't
> > change their behavior, although new ones will.  That's a mixed
> > blessing; it makes it harder to switch an existing repo to disallowing
> > SHA1/URL/WORM, but an accidental/malicious re-enabling won't affect
> > clones made while it was disabled. 
> > > This is done now.
> > 
> > Could a repository be configured to either always disallow
> > SHA1/URL/WORM, or always allow them, and then not let that be changed?
> > Maybe -- Look through all the history of the git-annex branch from the
> > earliest commit forward. The first value stored in 
> > git-annex/disableinsecurehashes (eg 0 or 1) is the value to use;
> > any later changes are ignored. 
> > That would be a little slow, but only needs to be done at init time.
> > It might be possible to fool this though. Create a new empty branch,
> > with an old date, make a commit enabling insecure hashes, and
> > merge it into git-annex branch HEAD. It now looks as if insecure hashes
> > were disabled earliest.

> > > Well, annex.securehashesonly is implemented now. It currently needs to be
> > > set in each clone that cares about it. --[[Joey]]

----

A few other potential problems:

* A symlink target like .git/annex/objects/XX/YY/SHA256--foo
  might be able to be manipulated to add collision data in the path.
  For example, .git/annex/objects/collisiondata/../XX/YY/SHA256--foo

  I think this is not a valid attack, because at least on linux,
  such a symlink won't be followed, unless the
  .git/annex/objects/collisiondata directory exists.

* `*E` backends could embed sha1 collision data in a long filename
  extension in a key.

  The recent SHA1 common-prefix attack could be used to exploit this;
  the result would be two keys that have the same SHA1.

  This can be fixed by limiting the length
  of an extension allowed in such a key to the longest such extension
  git-annex has ever supported (probably < 20 bytes or so), which would
  be less than the size of the data needed for current SHA1 collision
  attacks. Now done; git-annex refuses to use keys with super
  long extensions.

* It might be possible to embed colliding data in a specially constructed
  key name with an extra field in it, eg "SHA256-cXXXXXXXXXXXXXXX-...".
  Need to review the code and see if such extra fields are allowed.  

  Update: All fields are numeric, but could contain arbitrary data
  after the number. Could have been used in a common-prefix attack.
  This has been fixed; git-annex refuses to parse
  such fields, so it won't work with files that try to exploit this.

* A symlink target like .git/annex/objects/XX/YY/SHA256--foo
  might be able to be manipulated to add collision data in the path.
  For example, .git/annex/objects/collisiondata/../XX/YY/SHA256--foo

  I think this is not a valid attack, because at least on linux,
  such a symlink won't be followed, unless the
  .git/annex/objects/collisiondata directory exists.