aboutsummaryrefslogtreecommitdiff
path: root/doc/bugs/Allow_automatic_retry_git_annex_get
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2016-10-26 15:38:22 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2016-10-26 15:38:27 -0400
commitc6122cf0a40ce4c565957f68e1076b6ada5c2bea (patch)
treeb2b2a4e52a48d79e3984b5e39672d738d2dbfa8d /doc/bugs/Allow_automatic_retry_git_annex_get
parent85cb9ac4ab6844166e539ddc6b15922f79c74a19 (diff)
enable forwardRetry for command-line transfers
If a transfer fails for some reason, but some data managed to be sent, the transfer will be retried. (The assistant already did this.) Possible impacts: * More ssh prompts if ssh needs to prompt for a password to connect to a host, or is prompting about some other problem like a ssh key mismatch. * More data transfer due to retrying, epecially when a remote does not support resuming a transfer. In the worst case, a lot of data will be transferred but it fails before the end, and then all that data gets transferred again plus one byte more; repeat until it manages to get the whole file.
Diffstat (limited to 'doc/bugs/Allow_automatic_retry_git_annex_get')
-rw-r--r--doc/bugs/Allow_automatic_retry_git_annex_get/comment_2_6d05cd09e1f00fb5ace2b9ae3bffdedb._comment66
1 files changed, 66 insertions, 0 deletions
diff --git a/doc/bugs/Allow_automatic_retry_git_annex_get/comment_2_6d05cd09e1f00fb5ace2b9ae3bffdedb._comment b/doc/bugs/Allow_automatic_retry_git_annex_get/comment_2_6d05cd09e1f00fb5ace2b9ae3bffdedb._comment
new file mode 100644
index 000000000..6d2563e46
--- /dev/null
+++ b/doc/bugs/Allow_automatic_retry_git_annex_get/comment_2_6d05cd09e1f00fb5ace2b9ae3bffdedb._comment
@@ -0,0 +1,66 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 2"""
+ date="2016-10-26T18:26:35Z"
+ content="""
+The most common way a network connection can stall like this is when
+moving to a different wifi network: the connection is open but
+no more data will be received. I suppose other kinds of network
+glitches could also lead to this kind of situation.
+
+ssh has some things, like ServerAliveInterval and TCPKeepAlive,
+that it can use to detect such problems. You may find them useful.
+
+----
+
+As for the retrying once a stall is detected, some transfers use
+`forwardRetry` which will automatically retry as long as the failed try
+managed to send some data. But the get/move/copy commands currently use
+`noRetry`. I can't find any justification for not always using
+`forwardRetry`; I think that it was added for the assistant originally and
+the other stuff just never switched over.
+
+Only problem I can think of is, if there actually is a ssh password
+prompt, it would prompt again on retry. But most people using git-annex
+with ssh have something in place to make ssh not prompt repeatedly for
+passwords.
+
+So, I've gone ahead and enabled `forwardRetry` for everything.
+
+----
+
+Occurs to me that git-annex could try to notice when a transfer is not
+progressing, by reusing the existing progress metering code.
+
+Since some remotes don't update the progress meter, this could
+only be used to detect stalls after the progress meter has been updated
+at least once. If the stall occurs earlier than that, it would not be able
+to be detected.
+
+It seems quite hard to come up with a good timeout value to detect a
+stalled connection. Often progress meters are updated after every small
+(eg 32kb) chunk transferred. But others might poll periodically, or might
+use a larger chunk size. It's even possible that some special remotes
+are looking at a percent output by some program, and only update the meter
+when the percent transferred changes -- in which case it could be many
+minutes in between each meter update when a large file is being
+transferred.
+
+If the timeout is too short, git-annex will stall in a new way, by
+constantly killing "stalled" connections before they can send enough data.
+
+----
+
+So it really seems better to fix the ssh connection to not stall, since
+that is not so heuristic a fix. Seems like git-annex could force
+ServerAliveInterval to be set, and perhaps lower ServerAliveCountMax from 3
+to 1. The ssh BatchMode setting sets the former to 300, so a stalled
+connection will time out after 15 minutes. But BatchMode also disables
+prompting, and git-annex should not disable that.
+
+Catch is, what if the user has configured ssh with some
+other ServerAliveInterval value? We don't want git-annex to override that.
+
+(git-annex does have a rudimentary .ssh/config parser, but it's not
+good enough to handle eg, "Host *.example.com ")
+"""]]