aboutsummaryrefslogtreecommitdiffhomepage
path: root/doc/connection-backoff.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/connection-backoff.md')
-rw-r--r--doc/connection-backoff.md57
1 files changed, 19 insertions, 38 deletions
diff --git a/doc/connection-backoff.md b/doc/connection-backoff.md
index 47b71f927b..7094e737c5 100644
--- a/doc/connection-backoff.md
+++ b/doc/connection-backoff.md
@@ -8,58 +8,39 @@ requests) and instead do some form of exponential backoff.
We have several parameters:
1. INITIAL_BACKOFF (how long to wait after the first failure before retrying)
2. MULTIPLIER (factor with which to multiply backoff after a failed retry)
- 3. MAX_BACKOFF (Upper bound on backoff)
- 4. MIN_CONNECTION_TIMEOUT
+ 3. MAX_BACKOFF (upper bound on backoff)
+ 4. MIN_CONNECT_TIMEOUT (minimum time we're willing to give a connection to
+ complete)
## Proposed Backoff Algorithm
Exponentially back off the start time of connection attempts up to a limit of
-MAX_BACKOFF.
+MAX_BACKOFF, with jitter.
```
ConnectWithBackoff()
current_backoff = INITIAL_BACKOFF
current_deadline = now() + INITIAL_BACKOFF
- while (TryConnect(Max(current_deadline, MIN_CONNECT_TIMEOUT))
+ while (TryConnect(Max(current_deadline, now() + MIN_CONNECT_TIMEOUT))
!= SUCCESS)
SleepUntil(current_deadline)
current_backoff = Min(current_backoff * MULTIPLIER, MAX_BACKOFF)
- current_deadline = now() + current_backoff
-```
-
-## Historical Algorithm in Stubby
-
-Exponentially increase up to a limit of MAX_BACKOFF the intervals between
-connection attempts. This is what stubby 2 uses, and is equivalent if
-TryConnect() fails instantly.
+ current_deadline = now() + current_backoff +
+ UniformRandom(-JITTER * current_backoff, JITTER * current_backoff)
```
-LegacyConnectWithBackoff()
- current_backoff = INITIAL_BACKOFF
- while (TryConnect(MIN_CONNECT_TIMEOUT) != SUCCESS)
- SleepFor(current_backoff)
- current_backoff = Min(current_backoff * MULTIPLIER, MAX_BACKOFF)
-```
-
-The grpc C implementation currently uses this approach with an initial backoff
-of 1 second, multiplier of 2, and maximum backoff of 120 seconds. (This will
-change)
-Stubby, or at least rpc2, uses exactly this algorithm with an initial backoff
-of 1 second, multiplier of 1.2, and a maximum backoff of 120 seconds.
+With specific parameters of
+MIN_CONNECT_TIMEOUT = 20 seconds
+INITIAL_BACKOFF = 1 second
+MULTIPLIER = 1.6
+MAX_BACKOFF = 120 seconds
+JITTER = 0.2
-## Use Cases to Consider
+Implementations with pressing concerns (such as minimizing the number of wakeups
+on a mobile phone) may wish to use a different algorithm, and in particular
+different jitter logic.
-* Client tries to connect to a server which is down for multiple hours, eg for
- maintenance
-* Client tries to connect to a server which is overloaded
-* User is bringing up both a client and a server at the same time
- * In particular, we would like to avoid a large unnecessary delay if the
- client connects to a server which is about to come up
-* Client/server are misconfigured such that connection attempts always fail
- * We want to make sure these don’t put too much load on the server by
- default.
-* Server is overloaded and wants to transiently make clients back off
-* Application has out of band reason to believe a server is back
- * We should consider an out of band mechanism for the client to hint that
- we should short circuit the backoff.
+Alternate implementations must ensure that connection backoffs started at the
+same time disperse, and must not attempt connections substantially more often
+than the above algorithm.