[Make-wifi-fast] Status of the industry on over buffering at the WiFi air interface

David P. Reed dpreed at deepplum.com
Thu Feb 13 17:23:21 EST 2020


More interesting anecdotal information. My friend did some simple dslreports speed tests with hi Netgear NightHawk access point and directly wired to his cable modem.

What he observed is interesting: directly connecting his laptop to the cable modem (DOCSIS 3.1, but limited in downlink and uplink speeds to 250/25 Mb/sec), and then connecting over the NightHawk wired to the cable modem.

dslreports gives "lag under load" statistics with both a "letter grade" and a bar graph showing the range of packet delays to the various servers. What jumped out was this:

1) directly connected to cable modem/router via  a GigE cable, speeds were as expected, and lag-under-load got an A+.  No bufferbloat in the ISP or cable link detected.

2) going indirect through the NightHawk AP, the speeds are not surprising (802.11ac can definitely fill the uplink of the cable modem at 25 Mb/s). BUT... a letter grade of "F" on bufferbloat, and the numbers for "lag under load" indicate a variable value going from 2000 msec. up to 5000 msec.

Hmm... what is causing this?  Well, it's pretty unlikely the delay is in the test computer's buffers - they work fine at full uplink capacity of the cable modem (25 Mb/sec), so the software stack on the test computer, an HP laptop running Windows, is unlikely the problem.  And there's no evidence of the problem being off premises.

So we are left with only one possible result, says Sherlock, after considering alternatives. A queue in the NightHawk AP feeding the ethernet link into the cable modem is a big problem. What queue might that be? A queue in the WiFi hardware or driver? Or the outbound queue feeding the Ethernet link to the router?

Well, I can't prove this, but consider that the 802.11ac link data rate can sustain much more than 25 MB/sec, so the "bottleneck" link here is the limited uplink capacity at the Modem.

The modem clearly is capable of giving congestion control signals to a directly connected Ethernet path (non-wireless), by dropping packets.

So what is going on here?

I admit I'm a bit puzzled, but not completely. If there is a queue building up in the NightHawk AP, it's possible that its queues are building up indirectly because of some interaction that makes the HP laptop fail to understand it is not getting TCP ACKs, and instead deciding the physical RTT end-to-end is getting longer, so that it starts behaving like the end-to-end physical RTT is multiple seconds long.

That could be a problem with Microsoft's proprietary Windows TCP stack. It may have a really dumb or not thought through RTT estimator in it.

But the thing that doesn't fit that picture quite right is that the queue seems to be growing to multiple seconds in the NightHawk AP, or possibly in the Windows 802.11 device driver due to some weird interaction.

Anyway, this is really interesting.  I don't know what to recommend to my frient (other than trying some other combinations - a Linux laptop, an OpenWRT AP or both).

BUt my observation that the "Industry status" continues to seem to suck on congestion in consumer facing gear seems to be verified. And there's no easy to use tool out there that can say "the problem is *in this component*"  (and the Best Buy guys are happy to encourage looking for "radio interference" as if that has anything to do with 5 seconds of queueing building up somewhere.


On Thursday, February 13, 2020 4:32pm, "Bob McMahon" <bob.mcmahon at broadcom.com> said:

> Just a paper on inband telemetry for those that don't already know about
> it. Broadcom has a proprietary version for data center semiconductor
> products.  I don't know of anything that is end/end including the WiFi
> access hops.
> 
> https://p4.org/assets/INT-current-spec.pdf
> 
> Bob
> 
> On Wed, Feb 12, 2020 at 10:27 PM Bob McMahon via Make-wifi-fast <
> make-wifi-fast at lists.bufferbloat.net> wrote:
> 
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Bob McMahon <bob.mcmahon at broadcom.com>
>> To: "David P. Reed" <dpreed at deepplum.com>
>> Cc: Make-Wifi-fast <make-wifi-fast at lists.bufferbloat.net>
>> Bcc:
>> Date: Wed, 12 Feb 2020 22:27:14 -0800
>> Subject: Re: [Make-wifi-fast] Status of the industry on over buffering at
>> the WiFi air interface
>> Internally, we have telemetry as packets move through the end/end logic
>> subsystems.  A python controller receives all the telemetry from separate
>> netlink sockets.  It also maps all the time domains, e.g., TSF, into the
>> GPS time domain.  Then one can see exactly where packets are at any moment
>> in time.  We also produce stacked bar plots for each packet latency after
>> it moves from end.  Then produce clusters from there as there are millions
>> of packets.  Typically our main goal is to show our customers we're not the
>> problem and show that it's either their os/stack or air time, things we
>> don't control. (I argue we have more control over EDCA then we'd admit,
>> late bindings, e.g. MCS rate selection, etc., and per packet adaptive EDCAs
>> seem interesting)
>>
>> This type of WiFi network telemetry isn't supported outside of internal
>> tools.  There is some movement towards inserting network telemetry inside
>> TCP headers but not much. I believe SDN guys use it inside of data
>> centers.  If it's useful, adding it to open source tooling might be doable
>> though I'd need to do some thinking about the technical details a bit.  A
>> first obstacle is figuring out a common time domain or how to provide
>> sufficient information without one.
>>
>> Something like this could help drive ECN type features - not sure.  The
>> network engineering teams are so silo'd both within orgs and across
>> companies it's hard to truly optimize end/end problems.  The OSI layering
>> model tends to get in the way too, at least from an eng silo'ing
>> perspective.
>>
>> Bob
>>
>> On Wed, Feb 12, 2020 at 5:56 PM David P. Reed <dpreed at deepplum.com> wrote:
>>
>>> I know this is hard to measure, in general. Especially to isolate the
>>> issue because it combines packet scheduling, the AP's own activity, and the
>>> insertion of excess buffering in each device's hardware and driver
>>> software.
>>>
>>> However, what I'm looking for is evidence that helps locate the problem,
>>> which of course is a "distributed scheduling and buffering" problem, unlike
>>> the simple bufferbloat we all saw in the CMTS's of DOCSIS 2.0,, ALU's LTE
>>> deployments in the early days of 4G (at ATT Wireless), or the overbuffering
>>> in Arista Networks's switches, which were quite simple to measure and
>>> diagnose.
>>>
>>> On Wednesday, February 12, 2020 7:36pm, "Bob McMahon" <
>>> bob.mcmahon at broadcom.com> said:
>>>
>>> > hmm, not sure if this helps but "excess queueing" can be hard to define.
>>> >
>>> > Do you know the operating systems for the WiFi devices and if tooling
>>> can
>>> > be loaded upon them?  iperf clients samples RTT and CWND for linux
>>> > machines. Iperf 2.0.14 (in development) has a lot of latency related
>>> > features
>>> >
>>> > Also, if there is control over the AIFS one can set that for the high
>>> rates
>>> > devices such that they always win and the lower rate ones always lose.
>>> If
>>> > that solves things it does suggest WiFi tx queues developing per the
>>> TXOP
>>> > arbitration and air transmission as an issue.  Standard cwmin/cwmax
>>> isn't
>>> > as effective though it won't allow high rates to starve low rates
>>> devices
>>> > as AIFS might (depending upon the values)
>>> >
>>> > I use latency to measure the performance and define bounds that way and
>>> > it's very specific to use cases.  IT does require clock sync. My devices
>>> > have GPS disciplined oscillators which aren't common.
>>> >
>>> > As an aside, the HULL approach of phantom queues looks interesting.
>>> > https://people.csail.mit.edu/alizadeh/papers/hull-nsdi12.pdf
>>> >
>>> > Bob
>>> >
>>> > On Wed, Feb 12, 2020 at 4:08 PM David P. Reed <dpreed at deepplum.com>
>>> wrote:
>>> >
>>> >> A friend of mine (not a network expert, but a gadget freak), has been
>>> >> deploying wireless security cameras at his home and vacation home. He
>>> uses
>>> >> a single WiFi AP in each place, serving the security cameras etc.
>>> >>
>>> >> What he observes is this:
>>> >>
>>> >> Whenever anyone on a laptop in one of the homes uploads a modest sized
>>> >> file (over the same WiFi) the security systems all lose data.
>>> >>
>>> >> Now I can't go to his home to diagnose this, but I've asked him to
>>> check
>>> >> out his cable bufferbloat using dslreports, and he gets no bufferbloat
>>> >> there. But it sure looks like *severe* lag under load is affecting the
>>> >> security camera feed to the cloud servers that the company that sells
>>> the
>>> >> security cameras provides.
>>> >>
>>> >> So, is there a way to simply *diagnose* the WiFi air link for excess
>>> >> queueing in all the high rate WiFi devices? Something a non-net-head
>>> could
>>> >> do?
>>> >>
>>> >> The situation around congestion control in the industry continues to
>>> >> royally suck, in my opinion. The vendors don't care, the ISPs don't
>>> care
>>> >> (they can sell a higher speed connection than is actually needed and
>>> >> super-fabulous MIMO gadgets that still don't quite solve the problem).
>>> >>
>>> >> I'm an old guy, basically retired. I'm sad because the young folks
>>> remain
>>> >> clueless.
>>> >>
>>> >> And it's been decades since bufferbloat was discuvered, and the basic
>>> >> issue of congestion signalling being needed. I'm sure 5G (whatever it
>>> >> really is) is not paying attention to this network level congestion
>>> issue...
>>> >>
>>> >> _______________________________________________
>>> >> Make-wifi-fast mailing list
>>> >> Make-wifi-fast at lists.bufferbloat.net
>>> >> https://lists.bufferbloat.net/listinfo/make-wifi-fast
>>> >
>>>
>>>
>>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Bob McMahon via Make-wifi-fast <make-wifi-fast at lists.bufferbloat.net
>> >
>> To: "David P. Reed" <dpreed at deepplum.com>
>> Cc: Make-Wifi-fast <make-wifi-fast at lists.bufferbloat.net>
>> Bcc:
>> Date: Wed, 12 Feb 2020 22:27:28 -0800 (PST)
>> Subject: Re: [Make-wifi-fast] Status of the industry on over buffering at
>> the WiFi air interface
>> _______________________________________________
>> Make-wifi-fast mailing list
>> Make-wifi-fast at lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/make-wifi-fast
> 




More information about the Make-wifi-fast mailing list