| Commit message (Collapse) | Author | Age |
|
|
|
| |
PiperOrigin-RevId: 215248737
|
|
|
|
| |
PiperOrigin-RevId: 214648140
|
|
|
|
| |
PiperOrigin-RevId: 214643933
|
|
|
|
|
|
|
|
| |
closed batches are scheduled immediately if batch_thread_pool_ has an idle thead.
This helps to prevent overloads when traffic suddenly increase from very low to high.
PiperOrigin-RevId: 206077249
|
|
|
|
| |
PiperOrigin-RevId: 201721119
|
|
|
|
| |
PiperOrigin-RevId: 201429850
|
|
|
|
|
|
|
|
|
|
|
|
| |
AdaptiveSharedBatchScheduler, but increased reliablility and stability.
ASBS assumes request latency can be minimized at a specific number of batch processing threads. Under reasonable load, this is true and ASBS performs well, but under low load latency is basically unaffected by the number of threads, and ASBS can learn a wide variety of 'optimal' values. If load resumes suddenly, these values can give very poor latencies. In most cases, ASBS will recover, eventually rediscovering the correct value, but we have observed other cases where the latency is so large and noisy that ASBS can't get a good signal to guide its learning and the number of threads remains stuck at the bad value.
In addition, the incremental learning nature of this algorithm means that ASBS is always exploring to some extent, which can give rise to periods of non-optimal latency. This is most significant at high utilization where the wrong number of threads can potentially overload the system.
ASBS uses latency as a proxy for keeping the tensorflow processing pipeline optimally loaded. SDBS, on the other hand, uses a direct measurement of the pipeline fullness, and adjusts its number of batch processing threads accordingly. This solves the exploration problem. SDBS solves the low load problem by not adjusting its thread count when the threads pass some idleness threshold.
PiperOrigin-RevId: 198638918
|
|
|
|
|
|
|
| |
Revert #18413. Too many internal test failures due to the name scope change caused by this change.
Revert #18192. Cannot use re2::StringPiece internally. Need alternative for set call. Will pull and clean this up in a separate change.
PiperOrigin-RevId: 197991247
|
|
|
|
|
|
|
|
| |
a lower bound for in_flight_batches_limit.
This can help prevent overloads which may occur during large traffic shifts - a small value learned during a period of low load can be unsuitable at high load.
PiperOrigin-RevId: 196893320
|
|
|
|
| |
PiperOrigin-RevId: 194031845
|
|
|
|
| |
PiperOrigin-RevId: 193929733
|
|
|
|
| |
PiperOrigin-RevId: 192770717
|
|
|
|
| |
PiperOrigin-RevId: 192768744
|
|
|
|
| |
PiperOrigin-RevId: 190878279
|
|
|
|
|
|
|
|
| |
adaptive shared batcher. A full batch will now be scheduled before an older, nearly empty batch as long as the age gap is less than full_batch_scheduling_boost_micros.
This parameter improves latency under heavy load, but too large a value will harm tail latency.
PiperOrigin-RevId: 187644796
|
|
|
|
|
|
| |
batches implemntation delivers similar performance but is simpler and requires less tuning.
PiperOrigin-RevId: 187111685
|
|
|
|
| |
PiperOrigin-RevId: 183848459
|
|
|
|
|
|
| |
They don't make sense in the open source repository.
PiperOrigin-RevId: 183140889
|
|
PiperOrigin-RevId: 181795909
|