From af0f714d8aa927a4140fb5a42143c5858b98393e Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Tue, 6 Mar 2018 14:48:44 -0400 Subject: designing new git-annex-shell multi This commit was supported by the NSF-funded DataLad project. --- ...remotes_with_git-annex-shell_mass_protocol.mdwn | 33 ++++++++++++++++++++++ doc/todo/add_sftp_backend.mdwn | 9 ------ doc/todo/add_sftp_special_remote.mdwn | 13 +++++++++ ...-_directly_reuse_the_same_connection__63__.mdwn | 3 ++ 4 files changed, 49 insertions(+), 9 deletions(-) create mode 100644 doc/todo/accellerate_ssh_remotes_with_git-annex-shell_mass_protocol.mdwn delete mode 100644 doc/todo/add_sftp_backend.mdwn create mode 100644 doc/todo/add_sftp_special_remote.mdwn diff --git a/doc/todo/accellerate_ssh_remotes_with_git-annex-shell_mass_protocol.mdwn b/doc/todo/accellerate_ssh_remotes_with_git-annex-shell_mass_protocol.mdwn new file mode 100644 index 000000000..dd6be9a30 --- /dev/null +++ b/doc/todo/accellerate_ssh_remotes_with_git-annex-shell_mass_protocol.mdwn @@ -0,0 +1,33 @@ +As shown by benchmarks in +*[[here|todo/speed_up_transfers_over_ssh+rsync_--_directly_reuse_the_same_connection__63__]]*, +there is some overhead for each file transfer to a rsync special remote, to +set up the connection. Idea is to extend git-annex-shell with a command or +commands that don't use rsync for transferring objects, and that can handle +transferring or otherwise operating on multiple objects inside a single tcp +session. + +This might only be used when it doesn't need to resume transfer of a file; +it could fall back to rsync for resuming. + +Of course, when talking with a git-annex-shell that does not support this +new command, git-annex would still need to fall back to the old commands +using rsync. And should remember for the session that the remote doesn't +support the new command. + +It could use sftp, but that seems kind of difficult; it would need to lock +down sftp-server to only write annexed objects to the right repository. +And, using sftp would mean that git-annex would need to figure out the +filenames to use for annexed objects in the remote repository, rather than +letting git-annex-shell on the remote work that out. + +So, it seems better to not use sftp, and instead roll our own simple +file transfer protocol. + +So, "git-annex-shell -c multi" would speak a protocol over stdin/stdout +that essentially contains the commands inannex, lockcontent, dropkey, +recvkey, and sendkey. + +P2P.Protocol already contains such a similar protocol, used over tor. +That protocol even supports resuming interrupted transfers. +It has stuff including auth that this wouldn't need, but it would be +good to unify with it as much as possible. diff --git a/doc/todo/add_sftp_backend.mdwn b/doc/todo/add_sftp_backend.mdwn deleted file mode 100644 index 0874c729e..000000000 --- a/doc/todo/add_sftp_backend.mdwn +++ /dev/null @@ -1,9 +0,0 @@ -A sftp backend would be nice because gpg operations could be pipelined to the network transfer, not requiring the creation of a full file to disk with gpg before the network transmission, as it happens with rsync. - -There should be some libraries that can handle the sftp connections and transfers. I read that even curl has support for that. - -> Another reason to build this is that sftp has a `SFTP_FXP_STAT` -> that can get disk free space information. "echo df | sftp user@host" -> exposes this, when available. Some sftp servers can be locked down -> so that the user can't run git-annex on them, so that could be the only -> way to get diskreserve working for such a remote. --[[Joey]] diff --git a/doc/todo/add_sftp_special_remote.mdwn b/doc/todo/add_sftp_special_remote.mdwn new file mode 100644 index 000000000..c4dd1e294 --- /dev/null +++ b/doc/todo/add_sftp_special_remote.mdwn @@ -0,0 +1,13 @@ +A sftp special remote would be nice because gpg operations could be +pipelined to the network transfer, not requiring the creation of a full +file to disk with gpg before the network transmission, as it happens with +the rsync special remote. + +There should be some libraries that can handle the sftp connections and +transfers. I read that even curl has support for that. + +> Another reason to build this is that sftp has a `SFTP_FXP_STAT` +> that can get disk free space information. "echo df | sftp user@host" +> exposes this, when available. Some sftp servers can be locked down +> so that the user can't run git-annex on them, so that could be the only +> way to get diskreserve working for such a remote. --[[Joey]] diff --git a/doc/todo/speed_up_transfers_over_ssh+rsync_--_directly_reuse_the_same_connection__63__.mdwn b/doc/todo/speed_up_transfers_over_ssh+rsync_--_directly_reuse_the_same_connection__63__.mdwn index 9715b8bb6..fdd660706 100644 --- a/doc/todo/speed_up_transfers_over_ssh+rsync_--_directly_reuse_the_same_connection__63__.mdwn +++ b/doc/todo/speed_up_transfers_over_ssh+rsync_--_directly_reuse_the_same_connection__63__.mdwn @@ -32,3 +32,6 @@ ATM, even with ControlPersist=yes, on a fast interconnection between hosts (so i both hosts do not show any high CPU load + +> [[closed|done]]; wrung out all the perf gains we can without +> [[accellerate_ssh_remotes_with_git-annex-shell_mass_protocol]] --[[Joey]] -- cgit v1.2.3