summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2016-09-21 14:29:17 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2016-09-21 14:29:17 -0400
commitc02762882f52c632df00c92b35dac2cb4dfd85db (patch)
tree5f97feae5aeca33fa9f2566dc6394763d77bfadb
parentc411c395a6968a19dda42c90f8bdd334473c642f (diff)
comments
-rw-r--r--doc/todo/transitive_transfers/comment_1_d244295696be5a8164db45f7fad1701a._comment47
1 files changed, 47 insertions, 0 deletions
diff --git a/doc/todo/transitive_transfers/comment_1_d244295696be5a8164db45f7fad1701a._comment b/doc/todo/transitive_transfers/comment_1_d244295696be5a8164db45f7fad1701a._comment
new file mode 100644
index 000000000..48995dcb8
--- /dev/null
+++ b/doc/todo/transitive_transfers/comment_1_d244295696be5a8164db45f7fad1701a._comment
@@ -0,0 +1,47 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 1"""
+ date="2016-09-21T17:53:12Z"
+ content="""
+(Let's not discuss the behavior of copy --to when the file is not locally
+present here; there is plenty of other discussion of that in eg
+<http://bugs.debian.org/671179>)
+
+git-annex's special remote API does not allow remote-to-remote transfers
+without spooling it to a file on disk first. And it's not possible to do
+using rsync on either end, AFAICS. It would be possible in some other cases
+but this would need to be implemented for each type of remote as a new API
+call.
+
+Modern systems tend to have quite a large disk cache, so it's quite
+possible that going via a temp file on disk is not going to use a lot of
+disk IO to write and read it when the read and write occur fairly close
+together.
+
+The main benefit from streaming would probably be if it could run the
+download and the upload concurrently. But that would only be a benefit
+sometimes. With an asymmetric connection, saturating the uplink tends to
+swamp downloads. Also, if download is faster than upload, it would have to
+throttle downloads (which complicates the remote API much more), or buffer
+them to memory (which has its own complications).
+
+Streaming the download to the upload would at best speed things up by a
+factor of 2. It would probably work nearly as well to upload the previously
+downloaded file while downloading the next file.
+
+It would not be super hard to make `git annex copy --from --to` download a file,
+upload it, and then drop it, and parallelizing it with -J2 would
+keep both the --from and --to remotes bandwidth saturated pretty well.
+
+Alhough not perfectly, because two jobs could both be downloading while
+the uplink is left idle. To make it optimal, it would need to do the
+download and when done, push the upload+drop into another queue of actions
+that is processed concurrently with other downloads.
+
+And there is a complication with running that at the same time as eg `git
+annex get` of the same file. It would be surprising for get to succeed
+(because copy has already temporarily downloaded the file) and then have
+the file later get dropped. So, it seems that `copy --from --to` would need
+to stash the content away in a temp file somewhere instead of storing it
+in the annex proper.
+"""]]