summaryrefslogtreecommitdiff
path: root/doc/todo/sha1_collision_embedding_in_git-annex_keys.mdwn
blob: c3ecde01a22af6b2a18feac22f9b8f11c33f0c2e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Some git-annex backends allow embedding enough data in the names of keys
that it could be used for a SHA1 collision attack. So, a signed git commit
could point to a tree with such a key in it, and the blob for the key could
have two versions with the same SHA1.

Users who want to use git-annex with signed commits to mitigate git's own
SHA1 insecurities would like at least a way to disable the insecure
git-annex backends:

* WORM can contain fairly arbitrary data in a key name
* URL too (also, of course, URLs download arbitrary data from the web,
  so a signed git commit pointing at URL keys doesn't have any security
  even w/o SHA1 collisions)
* SHA1 and MD5 backends are insecure because there can be colliding
  versions of the data they point to.

A config setting to prevent git-annex from using insecure backends would be
useful.

(git-annex might suggest enabling that configuration if commit.gpgSign
is enabled)

A few other potential problems:

* `*E` backends could embed sha1 collision data in a long filename
  extension in a key.

  Impact is limited, because even if an attacker does this, the key also
  contains the checksum (eg SHA2) of the annexed data. The current SHA1
  attack is only a common-prefix attack; it does not allow creating two
  colliding keys that contain two different SHA2 checksums. That would need a
  chosen-prefix attack to be feasible. 
  
  It might be worth limiting the length
  of an extension allowed in such a key to the longest such extension
  git-annex has ever supported (probably < 20 bytes or so), which would
  be less than the size of the data needed for current SHA1 collision
  attacks. Presumably aa chosen-prefix attack would need a similar amount of
  data. Update: Now done; git-annex refuses to use keys with super
  long extensions.

* It might be possible to embed colliding data in a specially constructed
  key name with an extra field in it, eg "SHA256-cXXXXXXXXXXXXXXX-...".
  Need to review the code and see if such extra fields are allowed.  

  Update: All fields are numeric, but could contain arbitrary data
  after the number. Could have been used in a chosen-prefix attack
  (posibly; would require field to come after key name data) or
  preimage attack. This has been fixed; git-annex refuses to parse
  such fields, so it won't work with files that try to exploit this.