aboutsummaryrefslogtreecommitdiffhomepage
path: root/doc/connectivity-semantics-and-api.md
diff options
context:
space:
mode:
authorGravatar Eric Anderson <ejona@google.com>2015-05-11 09:22:47 -0700
committerGravatar Eric Anderson <ejona@google.com>2015-05-11 09:22:47 -0700
commit9855a34e1af51211764fc96485f79ce7692dd1ce (patch)
tree66849e577e219a1d41da915606582aa3bce28ec5 /doc/connectivity-semantics-and-api.md
parentad21fea239769e2a3be14ac56ee747eb0f3c13a3 (diff)
Add Connectivity doc
This is a faithful representation of the current version of the Google Doc with only formatting changes.
Diffstat (limited to 'doc/connectivity-semantics-and-api.md')
-rw-r--r--doc/connectivity-semantics-and-api.md122
1 files changed, 122 insertions, 0 deletions
diff --git a/doc/connectivity-semantics-and-api.md b/doc/connectivity-semantics-and-api.md
new file mode 100644
index 0000000000..842ff8adc1
--- /dev/null
+++ b/doc/connectivity-semantics-and-api.md
@@ -0,0 +1,122 @@
+gRPC Connectivity Semantics and API
+===================================
+
+This document describes the connectivity semantics for gRPC channels and the
+corresponding impact on RPCs. We then discuss an API.
+
+States of Connectivity
+----------------------
+
+gRPC Channels provide the abstraction over which clients can communicate with
+servers.The client-side channel object can be constructed using little more
+than a DNS name. Channels encapsulate a range of functionality including name
+resolution, establishing a TCP connection (with retries and backoff) and TLS
+handshakes. Channels can also handle errors on established connections and
+reconnect, or in the case of HTTP/2 GO_AWAY, re-resolve the name and reconnect.
+
+To hide the details of all this activity from the user of the gRPC API (i.e.,
+application code) while exposing meaningful information about the state of a
+channel, we use a state machine with four states, defined below:
+
+CONNECTING: The channel is trying to establish a connection and is waiting to
+make progress on one of the steps involved in name resolution, TCP connection
+establishment or TLS handshake. This is the initial state for all channels upon
+creation.
+
+READY: The channel has successfully established a connection all the way
+through TLS handshake (or equivalent) and all subsequent attempt to communicate
+have succeeded (or are pending without any known failure ).
+
+TRANSIENT_FAILURE: There has been some transient failure (such as a TCP 3-way
+handshake timing out or a socket error). Channels in this state will eventually
+switch to the CONNECTING state and try to establish a connection again. Since
+retries are done with exponential backoff, channels that fail to connect will
+start out spending very little time in this state but as the attempts fail
+repeatedly, the channel will spend increasingly large amounts of time in this
+state. For many non-fatal failures (e.g., TCP connection attempts timing out
+because the server is not yet available), the channel may be stuck in this
+state for an indefinitely large amount of time.
+
+FATAL_FAILURE: There has been a fatal failure and the channel will never
+attempt to establish a connection again. (e.g., a server presenting an invalid
+TLS certificate)
+
+Channels that enter this state never leave this state.
+
+The following table lists the legal transitions from one state to another and
+corresponding reasons. Empty cells denote disallowed transitions.
+
+<table style='border: 1px solid black'>
+ <tr>
+ <th>From/To</th>
+ <th>CONNECTING</th>
+ <th>READY</th>
+ <th>TRANSIENT_FAILURE</th>
+ <th>FATAL_FAILURE</th>
+ </tr>
+ <tr>
+ <th>CONNECTING</th>
+ <td>Incremental progress during connection establishment</td>
+ <td>All steps needed to establish a connection succeeded</td>
+ <td>Any failure in any of the steps needed to establish connection</td>
+ <td>Fatal failure encountered while attempting a connection.</td>
+ </tr>
+ <tr>
+ <th>READY</th>
+ <td></td>
+ <td>Incremental successful communication on established channel.</td>
+ <td>Any failure encountered while expecting successful communication on
+ established channel.</td>
+ <td></td>
+ </tr>
+ <tr>
+ <th>TRANSIENT_FAILURE</th>
+ <td>Wait time required to implement (exponential) backoff is over.</td>
+ <td></td>
+ <td></td>
+ <td></td>
+ </tr>
+ <tr>
+ <th>FATAL_FAILURE</th>
+ <td></td>
+ <td></td>
+ <td></td>
+ <td></td>
+ </tr>
+</table>
+
+
+Channel State API
+-----------------
+
+All gRPC libraries will expose a channel-level API method to poll the current
+state of a channel. In C++, this method is called GetCurrentState and returns
+an enum for one of the four legal states.
+
+All libraries should also expose an API that enables the application (user of
+the gRPC API) to be notified when the channel state changes. Since state
+changes can be rapid and race with any such notification, the notification
+should just inform the user that some state change has happened, leaving it to
+the user to poll the channel for the current state.
+
+The synchronous version of this API is:
+
+```cpp
+bool WaitForStateChange(gpr_timespec deadline, ChannelState source_state);
+```
+
+which returns true when the state changes to something other than the
+source_state and false if the deadline expires. Asynchronous and futures based
+APIs should have a corresponding method that allows the application to be
+notified when the state of a channel changes.
+
+Note that a notification is delivered every time there is a transition from any
+state to any *other* state. On the other hand the rules for legal state
+transition, require a transition from CONNECTING to TRANSIENT_FAILURE and back
+to CONNECTING for every recoverable failure, even if the corresponding
+exponential backoff requires no wait before retry. The combined effect is that
+the application may receive state change notifications that appear spurious.
+e.g., an application waiting for state changes on a channel that is CONNECTING
+may receive a state change notification but find the channel in the same
+CONNECTING state on polling for current state because the channel may have
+spent infinitesimally small amount of time in the TRANSIENT_FAILURE state.