aboutsummaryrefslogtreecommitdiffhomepage
path: root/doc/combiner-explainer.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/combiner-explainer.md')
-rw-r--r--doc/combiner-explainer.md158
1 files changed, 158 insertions, 0 deletions
diff --git a/doc/combiner-explainer.md b/doc/combiner-explainer.md
new file mode 100644
index 0000000000..9e9d077273
--- /dev/null
+++ b/doc/combiner-explainer.md
@@ -0,0 +1,158 @@
+# Combiner Explanation
+## Talk by ctiller, notes by vjpai
+
+Typical way of doing critical section
+
+```
+mu.lock()
+do_stuff()
+mu.unlock()
+```
+
+An alternative way of doing it is
+
+```
+class combiner {
+ run(f) {
+ mu.lock()
+ f()
+ mu.unlock()
+ }
+ mutex mu;
+}
+
+combiner.run(do_stuff)
+```
+
+If you have two threads calling combiner, there will be some kind of
+queuing in place. It's called `combiner` because you can pass in more
+than one do_stuff at once and they will run under a common `mu`.
+
+The implementation described above has the issue that you're blocking a thread
+for a period of time, and this is considered harmful because it's an application thread that you're blocking.
+
+Instead, get a new property:
+* Keep things running in serial execution
+* Don't ever sleep the thread
+* But maybe allow things to end up running on a different thread from where they were started
+* This means that `do_stuff` doesn't necessarily run to completion when `combiner.run` is invoked
+
+```
+class combiner {
+ mpscq q; // multi-producer single-consumer queue can be made non-blocking
+ state s; // is it empty or executing
+
+ run(f) {
+ if (q.push(f)) {
+ // q.push returns true if it's the first thing
+ while (q.pop(&f)) { // modulo some extra work to avoid races
+ f();
+ }
+ }
+ }
+}
+```
+
+The basic idea is that the first one to push onto the combiner
+executes the work and then keeps executing functions from the queue
+until the combiner is drained.
+
+Our combiner does some additional work, with the motivation of write-batching.
+
+We have a second tier of `run` called `run_finally`. Anything queued
+onto `run_finally` runs after we have drained the queue. That means
+that there is essentially a finally-queue. This is not guaranteed to
+be final, but it's best-effort. In the process of running the finally
+item, we might put something onto the main combiner queue and so we'll
+need to re-enter.
+
+`chttp2` runs all ops in the run state except if it sees a write it puts that into a finally. That way anything else that gets put into the combiner can add to that write.
+
+```
+class combiner {
+ mpscq q; // multi-producer single-consumer queue can be made non-blocking
+ state s; // is it empty or executing
+ queue finally; // you can only do run_finally when you are already running something from the combiner
+
+ run(f) {
+ if (q.push(f)) {
+ // q.push returns true if it's the first thing
+ loop:
+ while (q.pop(&f)) { // modulo some extra work to avoid races
+ f();
+ }
+ while (finally.pop(&f)) {
+ f();
+ }
+ goto loop;
+ }
+ }
+}
+```
+
+So that explains how combiners work in general. In gRPC, there is
+`start_batch(..., tag)` and then work only gets activated by somebody
+calling `cq::next` which returns a tag. This gives an API-level
+guarantee that there will be a thread doing polling to actually make
+work happen. However, some operations are not covered by a poller
+thread, such as cancellation that doesn't have a completion. Other
+callbacks that don't have a completion are the internal work that gets
+done before the batch gets completed. We need a condition called
+`covered_by_poller` that means that the item will definitely need some
+thread at some point to call `cq::next` . This includes those
+callbacks that directly cause a completion but also those that are
+indirectly required before getting a completion. If we can't tell for
+sure for a specific path, we have to assumed it is not covered by
+poller.
+
+The above combiner has the problem that it keeps draining for a
+potentially infinite amount of time and that can lead to a huge tail
+latency for some operations. So we can tweak it by returning to the application
+if we know that it is valid to do so:
+
+```
+while (q.pop(&f)) {
+ f();
+ if (control_can_be_returned && some_still_queued_thing_is_covered_by_poller) {
+ offload_combiner_work_to_some_other_thread();
+ }
+}
+```
+
+`offload` is more than `break`; it does `break` but also causes some
+other thread that is currently waiting on a poll to break out of its
+poll. This is done by setting up a per-polling-island work-queue
+(distributor) wakeup FD. The work-queue is the converse of the combiner; it
+tries to spray events onto as many threads as possible to get as much concurrency as possible.
+
+So `offload` really does:
+
+```
+ workqueue.run(continue_from_while_loop);
+ break;
+```
+
+This needs us to add another class variable for a `workqueue`
+(which is really conceptually a distributor).
+
+```
+workqueue::run(f) {
+ q.push(f)
+ eventfd.wakeup()
+}
+
+workqueue::readable() {
+ eventfd.consume();
+ q.pop(&f);
+ f();
+ if (!q.empty()) {
+ eventfd.wakeup(); // spray across as many threads as are waiting on this workqueue
+ }
+}
+```
+
+In principle, `run_finally` could get starved, but this hasn't
+happened in practice. If we were concerned about this, we could put a
+limit on how many things come off the regular `q` before the `finally`
+queue gets processed.
+