doc/design/assistant/telehash.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143

[Telehash](http://telehash.org/) for secure P2P communication between
git-annex (assistant) repositories.

Or something similar like [Snow](http://www.trustiosity.com/snow/)
or [cjdns](https://github.com/cjdelisle/cjdns) or tor or i2p or [magic wormhole](http://magic-wormhole.io/).

## telehash implementation status

* node.js version seems almost complete
* C version currently lacks channel support and seems buggy (13 Jan 2014)
* No pure haskell implementation of telehash v2. There was one of
  telehash v1 (even that seems incomplete). I have pinged its author
  to see if he anticipates updating it.
* Rapid development, situation may change in a month or 2. (2014)  
  Not seeing many commits now (2015)
* Is it secure? A security review should be done by competant people
  (not Joey). See <https://github.com/telehash/telehash.org/issues/23>
* **Haskell version** 
  <https://github.com/alanz/htelehash/tree/v2/src/TeleHash>
  May support v2; v3 support seems not started yet, and not in active
  development at the moment, although there has been work on it a year ago.
* Not very convinced this is going to be usable anytime soon, so would like
  to find something that is. (2015)

## snow status

* Seems ready to use?
* NAT punching works per docs; relies on a DHT network for hole punching,
  and the reliabilty of that network is not known. I notice it has only 1
  pre-seeded peer in the source tree for the DHT, and that peer was not up
  when I tried it.
* Only provides network-layer transport, still need to implement some 
  file transfer protocol on top.

## cjdns status

* Has a network with "hundreds of active nodes"
* Is not pure P2P; there's a network that does routing
  of packets. This may be a good thing, or not.
* Seems to require manual configuration of a "friend"
  node that's already on the network, with address and password to connect
  to it, so if you can't find someone you know to connect to their node,
  you can't use it. Urk.

## tor status

* Awesome.
* Easy to install, use; very well known.
* There's been some [haskell packages developed recently](http://www.leonmergen.com/haskell/privacy/2015/05/30/on-anonymous-networking-in-haskell-announcing-tor-and-i2p-for-haskell.html) 
  to communicate with tor and set up onion addresses for a service.
  Could be used to make git-annex run as a hidden service.

## i2p status

## magic wormhole

* simple file transfer protocol with out of band shared secret
* handles NAT transversal
* easy to use
* doesn't require a running daemon
* can transfer arbitrary blobs (strings, directories, files)

## implementation basics

* Add a telehash.log that maps between uuid and telehash address.
  Or let's generalize it a bit; since things like snow work close enough
  to the same. Make it address.log and map between uuid and (networktype, address)
* On startup, assistant creates a new telehash keypair if not already
  present; stores this locally and generates a telehash address from it,
  stored in address.log.
  (Or, if using snow, uses dns to look up the encryption public key address
  of the local snow server, and stores that in address.log.)
* Use telehash for notifications of changes to the repository
* Do git push over telehash. (Pretty easy, may need rate limiting in
  situations involving relays.)
* Remove git push over XMPP (which has several problems including
  XMPP being an unreliable transport, requiring a separate XMPP account per
  repo, and XMPP not being end-to-end encrypted)

## address discovery

The address is a public key, so won't want to type that in. Need discovery.

* Easy way is any set of repos that are already connected can communicate
  them via address.log.
* Local pairing can be used for address discovery. Could be made
  to work without ssh (with content transfer over telehash discussed
  below).
* XMPP pairing can also be used for address discovery. (Note that
  MITM attacks are possible.) Is it worth keeping XMPP in git-annex just
  for this?
* Addresses of repositories can be communicated out of band (eg,
  via an OTR session or gpg signed mail), and pasted into the webapp to
  initiate a repository pairing that then proceeds entirely over telehash.
  Once both sides do this, the pairing can proceed automatically.

## content transfer over telehash

* In some circumstances, it would be ok to do annexed content transfer
  over telehash. 
  Need to check if there are MTU problems with large data bodies in
  telehash messages.  
  Probably not when a bridge is being used, due to required rate
  limiting in bridging over telehash. Cloud transfer remotes still needed for
  those situations.  
  (And it should be fine to do it over snow, maybe more so.)  
* On a LAN, telehash can be used to determine the current local IP address
  of another computer on the LAN. The 2 could then determine if either uses
  ssh and if so use regular git-annex-shell for transfers. Or could do
  annexed content transfer directly over telehash.  
  (Snow does not provide this feature AFAIK.)

## generic git-remote-telehash

This might turn out to be easy to split off from git-annex, so `git pull`
and `git push` can be used at the command line to access telehash remotes.
Allows using general git entirely decentralized and with end-to-end
encryption.

## separate daemon?

See [[git-remote-daemon]] for its design.

Advantages:

* `git annex sync` could also use the running daemon
* `git-remote-telehash` could use the running daemon
* c-telehash might end up linked to openssl, which has licence combination
  problems with git-annex. A separate process not using git-annex's code
  would avoid this.
* Allows the daemon to be written in some other language if necessary
  (for example, if c-telehash development stalls and the nodejs version is
  already usable)
* Potentially could be generalized to handle other similar protocols.
  Or even the xmpp code moved into it. There could even be git-annex native
  exchange protocols implemented in such a daemon to allow SSH-less
  transfers.
* Security holes in telehash would not need to compromise the entire
  git-annex. daemon could be sandboxed in one way or another.

Disadvantages:

* Adds some complexity.