aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/contrib/verbs
diff options
context:
space:
mode:
authorGravatar Shanqing Cai <cais@google.com>2018-03-12 19:33:52 -0700
committerGravatar TensorFlower Gardener <gardener@tensorflow.org>2018-03-12 19:37:39 -0700
commit7144571f2fc59c8705e4e3d7b922fa0ebf44f3fa (patch)
treeb14683f826541c183c1bb783265e13b565469fbb /tensorflow/contrib/verbs
parent2bda52d485c9715dcd17f49526cea7890e091cb8 (diff)
Merge changes from github.
PiperOrigin-RevId: 188817194
Diffstat (limited to 'tensorflow/contrib/verbs')
-rw-r--r--tensorflow/contrib/verbs/README.md2
-rw-r--r--tensorflow/contrib/verbs/patch_notes_verbs_with_0_copies.md2
-rw-r--r--tensorflow/contrib/verbs/rdma.cc1
3 files changed, 3 insertions, 2 deletions
diff --git a/tensorflow/contrib/verbs/README.md b/tensorflow/contrib/verbs/README.md
index 58fed4e5cb..4b6104a8b4 100644
--- a/tensorflow/contrib/verbs/README.md
+++ b/tensorflow/contrib/verbs/README.md
@@ -93,7 +93,7 @@ When the receiver receives the RDMA write, it will locate the relevant **RdmaTen
1. When the sender receives a tensor request, the source tensor may or may not be ready yet. The situation is handled through a process of tag matching:
* If the request arrives before the tensor is ready, then a callback is put in a local table, and will be invoked once the tensor arrives.
- * If the tensor is ready before the request arives, than the tensor is put in a local table. When the request arrives, it will invoke the callback immediately.
+ * If the tensor is ready before the request arrives, than the tensor is put in a local table. When the request arrives, it will invoke the callback immediately.
In code it is done by calling **RecvLocalAsync()**, which receives the tensor's key, step-id, and the callback.
2. When the callback is invoked, the relevant tensor is removed from the tag matching table. In the case where we need to send the tensor's meta-data, the **RdmaTensorResponse** will store a copy of the tensor until the re-request arrives.
3. The sending of protocol messages (**RDMA_MESSAGE_TENSOR_REQUEST**, **RDMA_MESSAGE_META_DATA_RESPONSE** and **RDMA_MESSAGE_TENSOR_RE_REQUEST**) is done by the class **RdmaMessageBuffer**. All messages are sent using RDMA writes from/to fixed messages buffers. This implies that we cannot send on a specific channel more than one message at a time. In order to synchronize the messages, the **RdmaMessageBuffer** holds the a local and remote buffer statuses which can be either busy or idle. When a write is issued, both statuses will be changed to busy. When the write-complete event is received, the local status is changed to idle. When the write is received on the remote side, the remote side will parse the message, and return an ACK back to the sending side on which the sending side will update the remote status to idle. When both the local and remote statuses are idle, the next message can be sent.
diff --git a/tensorflow/contrib/verbs/patch_notes_verbs_with_0_copies.md b/tensorflow/contrib/verbs/patch_notes_verbs_with_0_copies.md
index 956b8f2147..da6fdd48e1 100644
--- a/tensorflow/contrib/verbs/patch_notes_verbs_with_0_copies.md
+++ b/tensorflow/contrib/verbs/patch_notes_verbs_with_0_copies.md
@@ -64,7 +64,7 @@ The protocol messages themselves will remain mostly unchanged at the first stage
* type - The message type.
* request_index - Request index.
* is_dead/data_type/tensor_shape/tensor_bytes - The up-to-date meta-data.
-* **RDMA_MESSAGE_BUFFER_RESPONSE** - (receiver ==> sender) Tensor re-requset after meta-data update and reallocation of result/proxy tensors.
+* **RDMA_MESSAGE_BUFFER_RESPONSE** - (receiver ==> sender) Tensor re-request after meta-data update and reallocation of result/proxy tensors.
* type - The message type.
* name (name_size) - Name of the requested tensor.
* step_id - Step ID.
diff --git a/tensorflow/contrib/verbs/rdma.cc b/tensorflow/contrib/verbs/rdma.cc
index 7d95b6522c..86350a08e5 100644
--- a/tensorflow/contrib/verbs/rdma.cc
+++ b/tensorflow/contrib/verbs/rdma.cc
@@ -30,6 +30,7 @@ limitations under the License.
#include "tensorflow/core/distributed_runtime/rendezvous_mgr_interface.h"
#include "tensorflow/core/distributed_runtime/rpc/grpc_util.h"
#include "tensorflow/core/distributed_runtime/session_mgr.h"
+#include "tensorflow/core/distributed_runtime/rpc/grpc_util.h"
#include "tensorflow/core/framework/rendezvous.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/lib/core/status.h"