aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2018-03-06 14:48:44 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2018-03-06 14:48:44 -0400
commitaf0f714d8aa927a4140fb5a42143c5858b98393e (patch)
treecc71888eeec9badaa1ccc3325457dc0e6e218312
parent3d3b2b0265bdcaf919def82050972a1079bdee0e (diff)
designing new git-annex-shell multi
This commit was supported by the NSF-funded DataLad project.
-rw-r--r--doc/todo/accellerate_ssh_remotes_with_git-annex-shell_mass_protocol.mdwn33
-rw-r--r--doc/todo/add_sftp_special_remote.mdwn (renamed from doc/todo/add_sftp_backend.mdwn)8
-rw-r--r--doc/todo/speed_up_transfers_over_ssh+rsync_--_directly_reuse_the_same_connection__63__.mdwn3
3 files changed, 42 insertions, 2 deletions
diff --git a/doc/todo/accellerate_ssh_remotes_with_git-annex-shell_mass_protocol.mdwn b/doc/todo/accellerate_ssh_remotes_with_git-annex-shell_mass_protocol.mdwn
new file mode 100644
index 000000000..dd6be9a30
--- /dev/null
+++ b/doc/todo/accellerate_ssh_remotes_with_git-annex-shell_mass_protocol.mdwn
@@ -0,0 +1,33 @@
+As shown by benchmarks in
+*[[here|todo/speed_up_transfers_over_ssh+rsync_--_directly_reuse_the_same_connection__63__]]*,
+there is some overhead for each file transfer to a rsync special remote, to
+set up the connection. Idea is to extend git-annex-shell with a command or
+commands that don't use rsync for transferring objects, and that can handle
+transferring or otherwise operating on multiple objects inside a single tcp
+session.
+
+This might only be used when it doesn't need to resume transfer of a file;
+it could fall back to rsync for resuming.
+
+Of course, when talking with a git-annex-shell that does not support this
+new command, git-annex would still need to fall back to the old commands
+using rsync. And should remember for the session that the remote doesn't
+support the new command.
+
+It could use sftp, but that seems kind of difficult; it would need to lock
+down sftp-server to only write annexed objects to the right repository.
+And, using sftp would mean that git-annex would need to figure out the
+filenames to use for annexed objects in the remote repository, rather than
+letting git-annex-shell on the remote work that out.
+
+So, it seems better to not use sftp, and instead roll our own simple
+file transfer protocol.
+
+So, "git-annex-shell -c multi" would speak a protocol over stdin/stdout
+that essentially contains the commands inannex, lockcontent, dropkey,
+recvkey, and sendkey.
+
+P2P.Protocol already contains such a similar protocol, used over tor.
+That protocol even supports resuming interrupted transfers.
+It has stuff including auth that this wouldn't need, but it would be
+good to unify with it as much as possible.
diff --git a/doc/todo/add_sftp_backend.mdwn b/doc/todo/add_sftp_special_remote.mdwn
index 0874c729e..c4dd1e294 100644
--- a/doc/todo/add_sftp_backend.mdwn
+++ b/doc/todo/add_sftp_special_remote.mdwn
@@ -1,6 +1,10 @@
-A sftp backend would be nice because gpg operations could be pipelined to the network transfer, not requiring the creation of a full file to disk with gpg before the network transmission, as it happens with rsync.
+A sftp special remote would be nice because gpg operations could be
+pipelined to the network transfer, not requiring the creation of a full
+file to disk with gpg before the network transmission, as it happens with
+the rsync special remote.
-There should be some libraries that can handle the sftp connections and transfers. I read that even curl has support for that.
+There should be some libraries that can handle the sftp connections and
+transfers. I read that even curl has support for that.
> Another reason to build this is that sftp has a `SFTP_FXP_STAT`
> that can get disk free space information. "echo df | sftp user@host"
diff --git a/doc/todo/speed_up_transfers_over_ssh+rsync_--_directly_reuse_the_same_connection__63__.mdwn b/doc/todo/speed_up_transfers_over_ssh+rsync_--_directly_reuse_the_same_connection__63__.mdwn
index 9715b8bb6..fdd660706 100644
--- a/doc/todo/speed_up_transfers_over_ssh+rsync_--_directly_reuse_the_same_connection__63__.mdwn
+++ b/doc/todo/speed_up_transfers_over_ssh+rsync_--_directly_reuse_the_same_connection__63__.mdwn
@@ -32,3 +32,6 @@ ATM, even with ControlPersist=yes, on a fast interconnection between hosts (so i
both hosts do not show any high CPU load
+
+> [[closed|done]]; wrung out all the perf gains we can without
+> [[accellerate_ssh_remotes_with_git-annex-shell_mass_protocol]] --[[Joey]]