aboutsummaryrefslogtreecommitdiff
path: root/doc/todo/accellerate_ssh_remotes_with_git-annex-shell_mass_protocol.mdwn
blob: dd6be9a305cfda39beeba797e2c0c96feb5df9de (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
As shown by benchmarks in
*[[here|todo/speed_up_transfers_over_ssh+rsync_--_directly_reuse_the_same_connection__63__]]*,
there is some overhead for each file transfer to a rsync special remote, to
set up the connection. Idea is to extend git-annex-shell with a command or
commands that don't use rsync for transferring objects, and that can handle
transferring or otherwise operating on multiple objects inside a single tcp
session.

This might only be used when it doesn't need to resume transfer of a file;
it could fall back to rsync for resuming.

Of course, when talking with a git-annex-shell that does not support this
new command, git-annex would still need to fall back to the old commands
using rsync. And should remember for the session that the remote doesn't
support the new command.

It could use sftp, but that seems kind of difficult; it would need to lock
down sftp-server to only write annexed objects to the right repository.
And, using sftp would mean that git-annex would need to figure out the
filenames to use for annexed objects in the remote repository, rather than
letting git-annex-shell on the remote work that out.

So, it seems better to not use sftp, and instead roll our own simple
file transfer protocol.

So, "git-annex-shell -c multi" would speak a protocol over stdin/stdout
that essentially contains the commands inannex, lockcontent, dropkey,
recvkey, and sendkey.

P2P.Protocol already contains such a similar protocol, used over tor.
That protocol even supports resuming interrupted transfers.
It has stuff including auth that this wouldn't need, but it would be
good to unify with it as much as possible.