diff options
Diffstat (limited to 'doc/combiner-explainer.md')
-rw-r--r-- | doc/combiner-explainer.md | 158 |
1 files changed, 158 insertions, 0 deletions
diff --git a/doc/combiner-explainer.md b/doc/combiner-explainer.md new file mode 100644 index 0000000000..9e9d077273 --- /dev/null +++ b/doc/combiner-explainer.md @@ -0,0 +1,158 @@ +# Combiner Explanation +## Talk by ctiller, notes by vjpai + +Typical way of doing critical section + +``` +mu.lock() +do_stuff() +mu.unlock() +``` + +An alternative way of doing it is + +``` +class combiner { + run(f) { + mu.lock() + f() + mu.unlock() + } + mutex mu; +} + +combiner.run(do_stuff) +``` + +If you have two threads calling combiner, there will be some kind of +queuing in place. It's called `combiner` because you can pass in more +than one do_stuff at once and they will run under a common `mu`. + +The implementation described above has the issue that you're blocking a thread +for a period of time, and this is considered harmful because it's an application thread that you're blocking. + +Instead, get a new property: +* Keep things running in serial execution +* Don't ever sleep the thread +* But maybe allow things to end up running on a different thread from where they were started +* This means that `do_stuff` doesn't necessarily run to completion when `combiner.run` is invoked + +``` +class combiner { + mpscq q; // multi-producer single-consumer queue can be made non-blocking + state s; // is it empty or executing + + run(f) { + if (q.push(f)) { + // q.push returns true if it's the first thing + while (q.pop(&f)) { // modulo some extra work to avoid races + f(); + } + } + } +} +``` + +The basic idea is that the first one to push onto the combiner +executes the work and then keeps executing functions from the queue +until the combiner is drained. + +Our combiner does some additional work, with the motivation of write-batching. + +We have a second tier of `run` called `run_finally`. Anything queued +onto `run_finally` runs after we have drained the queue. That means +that there is essentially a finally-queue. This is not guaranteed to +be final, but it's best-effort. In the process of running the finally +item, we might put something onto the main combiner queue and so we'll +need to re-enter. + +`chttp2` runs all ops in the run state except if it sees a write it puts that into a finally. That way anything else that gets put into the combiner can add to that write. + +``` +class combiner { + mpscq q; // multi-producer single-consumer queue can be made non-blocking + state s; // is it empty or executing + queue finally; // you can only do run_finally when you are already running something from the combiner + + run(f) { + if (q.push(f)) { + // q.push returns true if it's the first thing + loop: + while (q.pop(&f)) { // modulo some extra work to avoid races + f(); + } + while (finally.pop(&f)) { + f(); + } + goto loop; + } + } +} +``` + +So that explains how combiners work in general. In gRPC, there is +`start_batch(..., tag)` and then work only gets activated by somebody +calling `cq::next` which returns a tag. This gives an API-level +guarantee that there will be a thread doing polling to actually make +work happen. However, some operations are not covered by a poller +thread, such as cancellation that doesn't have a completion. Other +callbacks that don't have a completion are the internal work that gets +done before the batch gets completed. We need a condition called +`covered_by_poller` that means that the item will definitely need some +thread at some point to call `cq::next` . This includes those +callbacks that directly cause a completion but also those that are +indirectly required before getting a completion. If we can't tell for +sure for a specific path, we have to assumed it is not covered by +poller. + +The above combiner has the problem that it keeps draining for a +potentially infinite amount of time and that can lead to a huge tail +latency for some operations. So we can tweak it by returning to the application +if we know that it is valid to do so: + +``` +while (q.pop(&f)) { + f(); + if (control_can_be_returned && some_still_queued_thing_is_covered_by_poller) { + offload_combiner_work_to_some_other_thread(); + } +} +``` + +`offload` is more than `break`; it does `break` but also causes some +other thread that is currently waiting on a poll to break out of its +poll. This is done by setting up a per-polling-island work-queue +(distributor) wakeup FD. The work-queue is the converse of the combiner; it +tries to spray events onto as many threads as possible to get as much concurrency as possible. + +So `offload` really does: + +``` + workqueue.run(continue_from_while_loop); + break; +``` + +This needs us to add another class variable for a `workqueue` +(which is really conceptually a distributor). + +``` +workqueue::run(f) { + q.push(f) + eventfd.wakeup() +} + +workqueue::readable() { + eventfd.consume(); + q.pop(&f); + f(); + if (!q.empty()) { + eventfd.wakeup(); // spray across as many threads as are waiting on this workqueue + } +} +``` + +In principle, `run_finally` could get starved, but this hasn't +happened in practice. If we were concerned about this, we could put a +limit on how many things come off the regular `q` before the `finally` +queue gets processed. + |