[Cake] [Rpm] [Make-wifi-fast] The most wonderful video ever about bufferbloat

Wed Oct 26 16:38:55 EDT 2022

Hi Stuart,

> On Oct 20, 2022, at 20:32, Stuart Cheshire <cheshire at apple.com> wrote:
> 
> On 20 Oct 2022, at 02:36, Sebastian Moeller <moeller0 at gmx.de> wrote:
> 
>> Hi Stuart,
>> 
>> [SM] That seems to be somewhat optimistic. We have been there before, short of mandating actually-working oracle schedulers on all end-points, intermediate hops will see queues some more and some less transient. So we can strive to minimize queue build-up sure, but can not avoid queues and long queues completely so we need methods to deal with them gracefully.
>> Also not many applications are actually helped all that much by letting information get stale in their own buffers as compared to an on-path queue. Think an on-line reaction-time gated game, the need is to distribute current world state to all participating clients ASAP.
> 
> I’m afraid you are wrong about this.

	[SM] Well possible, would not be a first. ;)

> If an on-line game wants low delay, the only answer is for it to avoid generating position updates faster than the network carry them.

	[SM] +1; it seems I misconstrued the argument I wanted to make when bringing up gaming though. If you allow I will try to lay out why I believe that for some applications like some forms of gaming a competent scheduler can be leaps and bounds more helpful than the best AQM.
	Let's start with me conceding that when the required average rate of an application exceeds the networks capacity (for too much of the time) that application and that network path are not going to become/stay friends.
	That out of the way, the application profile I wanted to describe with the "gaming" tag is an application that on average sends relatively little, albeit in a clocked/bursty way, where every X milliseconds it wants to send a bunch of packets to each client; and where the fidelity of the predictive "simulation" performed by the clients critically depends on not deviating from the server-managed "word-state" for too long. (The longer the simulation runs without server updates the larger the expected deviation becomes and the more noticeable any actions that need to be taken later once world-updates arrive, so the goal is to send world-state relevant updates as soon as possible after the server calculated the authoritative state).
	These burtst will likely be sent close to the server's line rate and hence will create a (hopefully) transient queue at all places where the capacity gets smaller along the path. However the end result is that these packets arrive at the client as fast as possible.

> One packet giving the current game player position is better than a backlog of ten previous stale ones waiting to go out.

	[SM] Yes! In a multiplayer game each client really needs to be informed about all other players'/entites' actions. If this information is often? send in multiple packets (either because aggregate size exceeds a packet's "MTU/MSS" or because implementation-wise sending one packet per individual entity (players, NPCs, "bullets", ...) is preferable. Then all packets need to be received to appropriately update world-state... the faster this goes the less do clients go out of "sync".

> Sending packets faster than the network can carry them does not get them to the destination faster; it gets them there slower.

	[SM] Again I fully agree. Although in the limit case on an otherwise idle network sending our hypothetical bunch of packets from the server either at line rate or paced out to the bottleneck-rate of the path should deliver the bunch equally fast. That is sending the bunch as bunch is IMHO a rationale and defensible strategy for the server relieving it from having to keep state for the capacity for each client.

> The same applies to frames in a screen sharing application. Sending the current state of the screen *now* is better than having a backlog of ten previous stale frames sitting in buffers somewhere on their way to the destination.

	[SM] I respectfully argue that a screen sharing application that will send for prolonged durations well above a path's capacity is either not optimally designed or mis-configured. As I wrote before, I used (the free version of nomachine's) NX remote control across the Atlantic to southern California, and while not all that enjoyable it was leaps and bounds more usable than what you demonstrated in the video below. (I did however make concessions, like configuring NX to expect a slow WAN link manually, and did not configure full window dragging on the remote host).

> Stale data is not inevitable. Applications don’t need to have stale data if they avoid generating stale data in the first place.

	[SM] Alas no application using an internet path is in full control of avoiding queueing. Queues have a reason to exist (I personally like Nichols/Jacobsen description of queues acting as shock absorbers), especially over shared path with cross traffic (at least until we finally roll-out these fine oracle schedulers that I encounter sometimes in the literature to all endpoints ;) ).
	I do agree that applications generally should try to avoid dumping excessive amounts of data into the network.

> 
> Please watch this video, which explains it better than I can in a written email:
> 
> <https://developer.apple.com/videos/play/wwdc2015/719/?time=892>

	[SM] Argh, not a pleasant sight. But also not illustrating the case I was trying to make. 

	To come back to my point, for an application profile like the game traffic (that does not exceed capacity except in very short timeframes) a flow-queueing scheduler helps a lot, independent of whether the greedy flows sharing the same path are well behaved or not (let's ignore active malicious DOS traffic for this discussion). Once you have a competent scheduler the queueing problem moves from one "unfriendly" application can ruin an access link for all other flows to unfriendly applications can mostly make their own live miserable. To be clear I think both competent AQM and competent scheduling are desirable features that complement each other.*

Regards
	Sebastian

*) It goes without much saying that I consider L4S an unfortunate combination of not competent enough AQM with an inadequate scheduler, this is IMHO "too little, too late". The best I want to say about L4S is, that I think trying to signal more fine-grained queuing information from the network to the endpoints is a decent idea. L4S however fails to implement this idea in an acceptable fashion. In multiple ways; bit banging the queueing state in a multi-packet stream appears at best sub-optimal, compared to giving say each packet even few-bit accumulative queue-filling-state counter. Why? Because such a counter can be used to deduce queue filling rate quick enough to have a fighting chance to actually tackle the "when to exit slow-start" question, something that L4S essentially punted on (or did I miss a grand anouncement of paced chirping making it into a deployed network stack?).

> 
> Stuart Cheshire
>