Lets make wifi fast again!
 help / color / mirror / Atom feed
* [Make-wifi-fast] Status of the industry on over buffering at the WiFi air interface
@ 2020-02-13  0:08 David P. Reed
  2020-02-13  0:36 ` Bob McMahon
  0 siblings, 1 reply; 9+ messages in thread
From: David P. Reed @ 2020-02-13  0:08 UTC (permalink / raw)
  To: make-wifi-fast

A friend of mine (not a network expert, but a gadget freak), has been deploying wireless security cameras at his home and vacation home. He uses a single WiFi AP in each place, serving the security cameras etc.

What he observes is this:

Whenever anyone on a laptop in one of the homes uploads a modest sized file (over the same WiFi) the security systems all lose data.

Now I can't go to his home to diagnose this, but I've asked him to check out his cable bufferbloat using dslreports, and he gets no bufferbloat there. But it sure looks like *severe* lag under load is affecting the security camera feed to the cloud servers that the company that sells the security cameras provides.

So, is there a way to simply *diagnose* the WiFi air link for excess queueing in all the high rate WiFi devices? Something a non-net-head could do?

The situation around congestion control in the industry continues to royally suck, in my opinion. The vendors don't care, the ISPs don't care (they can sell a higher speed connection than is actually needed and super-fabulous MIMO gadgets that still don't quite solve the problem).

I'm an old guy, basically retired. I'm sad because the young folks remain clueless.

And it's been decades since bufferbloat was discuvered, and the basic issue of congestion signalling being needed. I'm sure 5G (whatever it really is) is not paying attention to this network level congestion issue...


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Make-wifi-fast] Status of the industry on over buffering at the WiFi air interface
  2020-02-13  0:08 [Make-wifi-fast] Status of the industry on over buffering at the WiFi air interface David P. Reed
@ 2020-02-13  0:36 ` Bob McMahon
  2020-02-13  1:56   ` David P. Reed
  0 siblings, 1 reply; 9+ messages in thread
From: Bob McMahon @ 2020-02-13  0:36 UTC (permalink / raw)
  To: David P. Reed; +Cc: Make-Wifi-fast

[-- Attachment #1: Type: text/plain, Size: 2775 bytes --]

hmm, not sure if this helps but "excess queueing" can be hard to define.

Do you know the operating systems for the WiFi devices and if tooling can
be loaded upon them?  iperf clients samples RTT and CWND for linux
machines. Iperf 2.0.14 (in development) has a lot of latency related
features

Also, if there is control over the AIFS one can set that for the high rates
devices such that they always win and the lower rate ones always lose.  If
that solves things it does suggest WiFi tx queues developing per the TXOP
arbitration and air transmission as an issue.  Standard cwmin/cwmax isn't
as effective though it won't allow high rates to starve low rates devices
as AIFS might (depending upon the values)

I use latency to measure the performance and define bounds that way and
it's very specific to use cases.  IT does require clock sync. My devices
have GPS disciplined oscillators which aren't common.

As an aside, the HULL approach of phantom queues looks interesting.
https://people.csail.mit.edu/alizadeh/papers/hull-nsdi12.pdf

Bob

On Wed, Feb 12, 2020 at 4:08 PM David P. Reed <dpreed@deepplum.com> wrote:

> A friend of mine (not a network expert, but a gadget freak), has been
> deploying wireless security cameras at his home and vacation home. He uses
> a single WiFi AP in each place, serving the security cameras etc.
>
> What he observes is this:
>
> Whenever anyone on a laptop in one of the homes uploads a modest sized
> file (over the same WiFi) the security systems all lose data.
>
> Now I can't go to his home to diagnose this, but I've asked him to check
> out his cable bufferbloat using dslreports, and he gets no bufferbloat
> there. But it sure looks like *severe* lag under load is affecting the
> security camera feed to the cloud servers that the company that sells the
> security cameras provides.
>
> So, is there a way to simply *diagnose* the WiFi air link for excess
> queueing in all the high rate WiFi devices? Something a non-net-head could
> do?
>
> The situation around congestion control in the industry continues to
> royally suck, in my opinion. The vendors don't care, the ISPs don't care
> (they can sell a higher speed connection than is actually needed and
> super-fabulous MIMO gadgets that still don't quite solve the problem).
>
> I'm an old guy, basically retired. I'm sad because the young folks remain
> clueless.
>
> And it's been decades since bufferbloat was discuvered, and the basic
> issue of congestion signalling being needed. I'm sure 5G (whatever it
> really is) is not paying attention to this network level congestion issue...
>
> _______________________________________________
> Make-wifi-fast mailing list
> Make-wifi-fast@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast

[-- Attachment #2: Type: text/html, Size: 3446 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Make-wifi-fast] Status of the industry on over buffering at the WiFi air interface
  2020-02-13  0:36 ` Bob McMahon
@ 2020-02-13  1:56   ` David P. Reed
  2020-02-13  6:27     ` Bob McMahon
       [not found]     ` <mailman.471.1581575247.1241.make-wifi-fast@lists.bufferbloat.net>
  0 siblings, 2 replies; 9+ messages in thread
From: David P. Reed @ 2020-02-13  1:56 UTC (permalink / raw)
  To: Bob McMahon; +Cc: Make-Wifi-fast

I know this is hard to measure, in general. Especially to isolate the issue because it combines packet scheduling, the AP's own activity, and the insertion of excess buffering in each device's hardware and driver software. 

However, what I'm looking for is evidence that helps locate the problem, which of course is a "distributed scheduling and buffering" problem, unlike the simple bufferbloat we all saw in the CMTS's of DOCSIS 2.0,, ALU's LTE deployments in the early days of 4G (at ATT Wireless), or the overbuffering in Arista Networks's switches, which were quite simple to measure and diagnose.

On Wednesday, February 12, 2020 7:36pm, "Bob McMahon" <bob.mcmahon@broadcom.com> said:

> hmm, not sure if this helps but "excess queueing" can be hard to define.
> 
> Do you know the operating systems for the WiFi devices and if tooling can
> be loaded upon them?  iperf clients samples RTT and CWND for linux
> machines. Iperf 2.0.14 (in development) has a lot of latency related
> features
> 
> Also, if there is control over the AIFS one can set that for the high rates
> devices such that they always win and the lower rate ones always lose.  If
> that solves things it does suggest WiFi tx queues developing per the TXOP
> arbitration and air transmission as an issue.  Standard cwmin/cwmax isn't
> as effective though it won't allow high rates to starve low rates devices
> as AIFS might (depending upon the values)
> 
> I use latency to measure the performance and define bounds that way and
> it's very specific to use cases.  IT does require clock sync. My devices
> have GPS disciplined oscillators which aren't common.
> 
> As an aside, the HULL approach of phantom queues looks interesting.
> https://people.csail.mit.edu/alizadeh/papers/hull-nsdi12.pdf
> 
> Bob
> 
> On Wed, Feb 12, 2020 at 4:08 PM David P. Reed <dpreed@deepplum.com> wrote:
> 
>> A friend of mine (not a network expert, but a gadget freak), has been
>> deploying wireless security cameras at his home and vacation home. He uses
>> a single WiFi AP in each place, serving the security cameras etc.
>>
>> What he observes is this:
>>
>> Whenever anyone on a laptop in one of the homes uploads a modest sized
>> file (over the same WiFi) the security systems all lose data.
>>
>> Now I can't go to his home to diagnose this, but I've asked him to check
>> out his cable bufferbloat using dslreports, and he gets no bufferbloat
>> there. But it sure looks like *severe* lag under load is affecting the
>> security camera feed to the cloud servers that the company that sells the
>> security cameras provides.
>>
>> So, is there a way to simply *diagnose* the WiFi air link for excess
>> queueing in all the high rate WiFi devices? Something a non-net-head could
>> do?
>>
>> The situation around congestion control in the industry continues to
>> royally suck, in my opinion. The vendors don't care, the ISPs don't care
>> (they can sell a higher speed connection than is actually needed and
>> super-fabulous MIMO gadgets that still don't quite solve the problem).
>>
>> I'm an old guy, basically retired. I'm sad because the young folks remain
>> clueless.
>>
>> And it's been decades since bufferbloat was discuvered, and the basic
>> issue of congestion signalling being needed. I'm sure 5G (whatever it
>> really is) is not paying attention to this network level congestion issue...
>>
>> _______________________________________________
>> Make-wifi-fast mailing list
>> Make-wifi-fast@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/make-wifi-fast
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Make-wifi-fast] Status of the industry on over buffering at the WiFi air interface
  2020-02-13  1:56   ` David P. Reed
@ 2020-02-13  6:27     ` Bob McMahon
       [not found]     ` <mailman.471.1581575247.1241.make-wifi-fast@lists.bufferbloat.net>
  1 sibling, 0 replies; 9+ messages in thread
From: Bob McMahon @ 2020-02-13  6:27 UTC (permalink / raw)
  To: David P. Reed; +Cc: Make-Wifi-fast

[-- Attachment #1: Type: text/plain, Size: 5319 bytes --]

Internally, we have telemetry as packets move through the end/end logic
subsystems.  A python controller receives all the telemetry from separate
netlink sockets.  It also maps all the time domains, e.g., TSF, into the
GPS time domain.  Then one can see exactly where packets are at any moment
in time.  We also produce stacked bar plots for each packet latency after
it moves from end.  Then produce clusters from there as there are millions
of packets.  Typically our main goal is to show our customers we're not the
problem and show that it's either their os/stack or air time, things we
don't control. (I argue we have more control over EDCA then we'd admit,
late bindings, e.g. MCS rate selection, etc., and per packet adaptive EDCAs
seem interesting)

This type of WiFi network telemetry isn't supported outside of internal
tools.  There is some movement towards inserting network telemetry inside
TCP headers but not much. I believe SDN guys use it inside of data
centers.  If it's useful, adding it to open source tooling might be doable
though I'd need to do some thinking about the technical details a bit.  A
first obstacle is figuring out a common time domain or how to provide
sufficient information without one.

Something like this could help drive ECN type features - not sure.  The
network engineering teams are so silo'd both within orgs and across
companies it's hard to truly optimize end/end problems.  The OSI layering
model tends to get in the way too, at least from an eng silo'ing
perspective.

Bob

On Wed, Feb 12, 2020 at 5:56 PM David P. Reed <dpreed@deepplum.com> wrote:

> I know this is hard to measure, in general. Especially to isolate the
> issue because it combines packet scheduling, the AP's own activity, and the
> insertion of excess buffering in each device's hardware and driver
> software.
>
> However, what I'm looking for is evidence that helps locate the problem,
> which of course is a "distributed scheduling and buffering" problem, unlike
> the simple bufferbloat we all saw in the CMTS's of DOCSIS 2.0,, ALU's LTE
> deployments in the early days of 4G (at ATT Wireless), or the overbuffering
> in Arista Networks's switches, which were quite simple to measure and
> diagnose.
>
> On Wednesday, February 12, 2020 7:36pm, "Bob McMahon" <
> bob.mcmahon@broadcom.com> said:
>
> > hmm, not sure if this helps but "excess queueing" can be hard to define.
> >
> > Do you know the operating systems for the WiFi devices and if tooling can
> > be loaded upon them?  iperf clients samples RTT and CWND for linux
> > machines. Iperf 2.0.14 (in development) has a lot of latency related
> > features
> >
> > Also, if there is control over the AIFS one can set that for the high
> rates
> > devices such that they always win and the lower rate ones always lose.
> If
> > that solves things it does suggest WiFi tx queues developing per the TXOP
> > arbitration and air transmission as an issue.  Standard cwmin/cwmax isn't
> > as effective though it won't allow high rates to starve low rates devices
> > as AIFS might (depending upon the values)
> >
> > I use latency to measure the performance and define bounds that way and
> > it's very specific to use cases.  IT does require clock sync. My devices
> > have GPS disciplined oscillators which aren't common.
> >
> > As an aside, the HULL approach of phantom queues looks interesting.
> > https://people.csail.mit.edu/alizadeh/papers/hull-nsdi12.pdf
> >
> > Bob
> >
> > On Wed, Feb 12, 2020 at 4:08 PM David P. Reed <dpreed@deepplum.com>
> wrote:
> >
> >> A friend of mine (not a network expert, but a gadget freak), has been
> >> deploying wireless security cameras at his home and vacation home. He
> uses
> >> a single WiFi AP in each place, serving the security cameras etc.
> >>
> >> What he observes is this:
> >>
> >> Whenever anyone on a laptop in one of the homes uploads a modest sized
> >> file (over the same WiFi) the security systems all lose data.
> >>
> >> Now I can't go to his home to diagnose this, but I've asked him to check
> >> out his cable bufferbloat using dslreports, and he gets no bufferbloat
> >> there. But it sure looks like *severe* lag under load is affecting the
> >> security camera feed to the cloud servers that the company that sells
> the
> >> security cameras provides.
> >>
> >> So, is there a way to simply *diagnose* the WiFi air link for excess
> >> queueing in all the high rate WiFi devices? Something a non-net-head
> could
> >> do?
> >>
> >> The situation around congestion control in the industry continues to
> >> royally suck, in my opinion. The vendors don't care, the ISPs don't care
> >> (they can sell a higher speed connection than is actually needed and
> >> super-fabulous MIMO gadgets that still don't quite solve the problem).
> >>
> >> I'm an old guy, basically retired. I'm sad because the young folks
> remain
> >> clueless.
> >>
> >> And it's been decades since bufferbloat was discuvered, and the basic
> >> issue of congestion signalling being needed. I'm sure 5G (whatever it
> >> really is) is not paying attention to this network level congestion
> issue...
> >>
> >> _______________________________________________
> >> Make-wifi-fast mailing list
> >> Make-wifi-fast@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/make-wifi-fast
> >
>
>
>

[-- Attachment #2: Type: text/html, Size: 6678 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Make-wifi-fast] Status of the industry on over buffering at the WiFi air interface
       [not found]     ` <mailman.471.1581575247.1241.make-wifi-fast@lists.bufferbloat.net>
@ 2020-02-13 21:32       ` Bob McMahon
  2020-02-13 22:23         ` David P. Reed
  0 siblings, 1 reply; 9+ messages in thread
From: Bob McMahon @ 2020-02-13 21:32 UTC (permalink / raw)
  To: Bob McMahon; +Cc: David P. Reed, Make-Wifi-fast

[-- Attachment #1: Type: text/plain, Size: 6774 bytes --]

Just a paper on inband telemetry for those that don't already know about
it. Broadcom has a proprietary version for data center semiconductor
products.  I don't know of anything that is end/end including the WiFi
access hops.

https://p4.org/assets/INT-current-spec.pdf

Bob

On Wed, Feb 12, 2020 at 10:27 PM Bob McMahon via Make-wifi-fast <
make-wifi-fast@lists.bufferbloat.net> wrote:

>
>
>
> ---------- Forwarded message ----------
> From: Bob McMahon <bob.mcmahon@broadcom.com>
> To: "David P. Reed" <dpreed@deepplum.com>
> Cc: Make-Wifi-fast <make-wifi-fast@lists.bufferbloat.net>
> Bcc:
> Date: Wed, 12 Feb 2020 22:27:14 -0800
> Subject: Re: [Make-wifi-fast] Status of the industry on over buffering at
> the WiFi air interface
> Internally, we have telemetry as packets move through the end/end logic
> subsystems.  A python controller receives all the telemetry from separate
> netlink sockets.  It also maps all the time domains, e.g., TSF, into the
> GPS time domain.  Then one can see exactly where packets are at any moment
> in time.  We also produce stacked bar plots for each packet latency after
> it moves from end.  Then produce clusters from there as there are millions
> of packets.  Typically our main goal is to show our customers we're not the
> problem and show that it's either their os/stack or air time, things we
> don't control. (I argue we have more control over EDCA then we'd admit,
> late bindings, e.g. MCS rate selection, etc., and per packet adaptive EDCAs
> seem interesting)
>
> This type of WiFi network telemetry isn't supported outside of internal
> tools.  There is some movement towards inserting network telemetry inside
> TCP headers but not much. I believe SDN guys use it inside of data
> centers.  If it's useful, adding it to open source tooling might be doable
> though I'd need to do some thinking about the technical details a bit.  A
> first obstacle is figuring out a common time domain or how to provide
> sufficient information without one.
>
> Something like this could help drive ECN type features - not sure.  The
> network engineering teams are so silo'd both within orgs and across
> companies it's hard to truly optimize end/end problems.  The OSI layering
> model tends to get in the way too, at least from an eng silo'ing
> perspective.
>
> Bob
>
> On Wed, Feb 12, 2020 at 5:56 PM David P. Reed <dpreed@deepplum.com> wrote:
>
>> I know this is hard to measure, in general. Especially to isolate the
>> issue because it combines packet scheduling, the AP's own activity, and the
>> insertion of excess buffering in each device's hardware and driver
>> software.
>>
>> However, what I'm looking for is evidence that helps locate the problem,
>> which of course is a "distributed scheduling and buffering" problem, unlike
>> the simple bufferbloat we all saw in the CMTS's of DOCSIS 2.0,, ALU's LTE
>> deployments in the early days of 4G (at ATT Wireless), or the overbuffering
>> in Arista Networks's switches, which were quite simple to measure and
>> diagnose.
>>
>> On Wednesday, February 12, 2020 7:36pm, "Bob McMahon" <
>> bob.mcmahon@broadcom.com> said:
>>
>> > hmm, not sure if this helps but "excess queueing" can be hard to define.
>> >
>> > Do you know the operating systems for the WiFi devices and if tooling
>> can
>> > be loaded upon them?  iperf clients samples RTT and CWND for linux
>> > machines. Iperf 2.0.14 (in development) has a lot of latency related
>> > features
>> >
>> > Also, if there is control over the AIFS one can set that for the high
>> rates
>> > devices such that they always win and the lower rate ones always lose.
>> If
>> > that solves things it does suggest WiFi tx queues developing per the
>> TXOP
>> > arbitration and air transmission as an issue.  Standard cwmin/cwmax
>> isn't
>> > as effective though it won't allow high rates to starve low rates
>> devices
>> > as AIFS might (depending upon the values)
>> >
>> > I use latency to measure the performance and define bounds that way and
>> > it's very specific to use cases.  IT does require clock sync. My devices
>> > have GPS disciplined oscillators which aren't common.
>> >
>> > As an aside, the HULL approach of phantom queues looks interesting.
>> > https://people.csail.mit.edu/alizadeh/papers/hull-nsdi12.pdf
>> >
>> > Bob
>> >
>> > On Wed, Feb 12, 2020 at 4:08 PM David P. Reed <dpreed@deepplum.com>
>> wrote:
>> >
>> >> A friend of mine (not a network expert, but a gadget freak), has been
>> >> deploying wireless security cameras at his home and vacation home. He
>> uses
>> >> a single WiFi AP in each place, serving the security cameras etc.
>> >>
>> >> What he observes is this:
>> >>
>> >> Whenever anyone on a laptop in one of the homes uploads a modest sized
>> >> file (over the same WiFi) the security systems all lose data.
>> >>
>> >> Now I can't go to his home to diagnose this, but I've asked him to
>> check
>> >> out his cable bufferbloat using dslreports, and he gets no bufferbloat
>> >> there. But it sure looks like *severe* lag under load is affecting the
>> >> security camera feed to the cloud servers that the company that sells
>> the
>> >> security cameras provides.
>> >>
>> >> So, is there a way to simply *diagnose* the WiFi air link for excess
>> >> queueing in all the high rate WiFi devices? Something a non-net-head
>> could
>> >> do?
>> >>
>> >> The situation around congestion control in the industry continues to
>> >> royally suck, in my opinion. The vendors don't care, the ISPs don't
>> care
>> >> (they can sell a higher speed connection than is actually needed and
>> >> super-fabulous MIMO gadgets that still don't quite solve the problem).
>> >>
>> >> I'm an old guy, basically retired. I'm sad because the young folks
>> remain
>> >> clueless.
>> >>
>> >> And it's been decades since bufferbloat was discuvered, and the basic
>> >> issue of congestion signalling being needed. I'm sure 5G (whatever it
>> >> really is) is not paying attention to this network level congestion
>> issue...
>> >>
>> >> _______________________________________________
>> >> Make-wifi-fast mailing list
>> >> Make-wifi-fast@lists.bufferbloat.net
>> >> https://lists.bufferbloat.net/listinfo/make-wifi-fast
>> >
>>
>>
>>
>
>
> ---------- Forwarded message ----------
> From: Bob McMahon via Make-wifi-fast <make-wifi-fast@lists.bufferbloat.net
> >
> To: "David P. Reed" <dpreed@deepplum.com>
> Cc: Make-Wifi-fast <make-wifi-fast@lists.bufferbloat.net>
> Bcc:
> Date: Wed, 12 Feb 2020 22:27:28 -0800 (PST)
> Subject: Re: [Make-wifi-fast] Status of the industry on over buffering at
> the WiFi air interface
> _______________________________________________
> Make-wifi-fast mailing list
> Make-wifi-fast@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast

[-- Attachment #2: Type: text/html, Size: 9041 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Make-wifi-fast] Status of the industry on over buffering at the WiFi air interface
  2020-02-13 21:32       ` Bob McMahon
@ 2020-02-13 22:23         ` David P. Reed
  2020-02-13 22:36           ` Jonathan Morton
  0 siblings, 1 reply; 9+ messages in thread
From: David P. Reed @ 2020-02-13 22:23 UTC (permalink / raw)
  To: Bob McMahon; +Cc: Bob McMahon, Make-Wifi-fast

More interesting anecdotal information. My friend did some simple dslreports speed tests with hi Netgear NightHawk access point and directly wired to his cable modem.

What he observed is interesting: directly connecting his laptop to the cable modem (DOCSIS 3.1, but limited in downlink and uplink speeds to 250/25 Mb/sec), and then connecting over the NightHawk wired to the cable modem.

dslreports gives "lag under load" statistics with both a "letter grade" and a bar graph showing the range of packet delays to the various servers. What jumped out was this:

1) directly connected to cable modem/router via  a GigE cable, speeds were as expected, and lag-under-load got an A+.  No bufferbloat in the ISP or cable link detected.

2) going indirect through the NightHawk AP, the speeds are not surprising (802.11ac can definitely fill the uplink of the cable modem at 25 Mb/s). BUT... a letter grade of "F" on bufferbloat, and the numbers for "lag under load" indicate a variable value going from 2000 msec. up to 5000 msec.

Hmm... what is causing this?  Well, it's pretty unlikely the delay is in the test computer's buffers - they work fine at full uplink capacity of the cable modem (25 Mb/sec), so the software stack on the test computer, an HP laptop running Windows, is unlikely the problem.  And there's no evidence of the problem being off premises.

So we are left with only one possible result, says Sherlock, after considering alternatives. A queue in the NightHawk AP feeding the ethernet link into the cable modem is a big problem. What queue might that be? A queue in the WiFi hardware or driver? Or the outbound queue feeding the Ethernet link to the router?

Well, I can't prove this, but consider that the 802.11ac link data rate can sustain much more than 25 MB/sec, so the "bottleneck" link here is the limited uplink capacity at the Modem.

The modem clearly is capable of giving congestion control signals to a directly connected Ethernet path (non-wireless), by dropping packets.

So what is going on here?

I admit I'm a bit puzzled, but not completely. If there is a queue building up in the NightHawk AP, it's possible that its queues are building up indirectly because of some interaction that makes the HP laptop fail to understand it is not getting TCP ACKs, and instead deciding the physical RTT end-to-end is getting longer, so that it starts behaving like the end-to-end physical RTT is multiple seconds long.

That could be a problem with Microsoft's proprietary Windows TCP stack. It may have a really dumb or not thought through RTT estimator in it.

But the thing that doesn't fit that picture quite right is that the queue seems to be growing to multiple seconds in the NightHawk AP, or possibly in the Windows 802.11 device driver due to some weird interaction.

Anyway, this is really interesting.  I don't know what to recommend to my frient (other than trying some other combinations - a Linux laptop, an OpenWRT AP or both).

BUt my observation that the "Industry status" continues to seem to suck on congestion in consumer facing gear seems to be verified. And there's no easy to use tool out there that can say "the problem is *in this component*"  (and the Best Buy guys are happy to encourage looking for "radio interference" as if that has anything to do with 5 seconds of queueing building up somewhere.


On Thursday, February 13, 2020 4:32pm, "Bob McMahon" <bob.mcmahon@broadcom.com> said:

> Just a paper on inband telemetry for those that don't already know about
> it. Broadcom has a proprietary version for data center semiconductor
> products.  I don't know of anything that is end/end including the WiFi
> access hops.
> 
> https://p4.org/assets/INT-current-spec.pdf
> 
> Bob
> 
> On Wed, Feb 12, 2020 at 10:27 PM Bob McMahon via Make-wifi-fast <
> make-wifi-fast@lists.bufferbloat.net> wrote:
> 
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Bob McMahon <bob.mcmahon@broadcom.com>
>> To: "David P. Reed" <dpreed@deepplum.com>
>> Cc: Make-Wifi-fast <make-wifi-fast@lists.bufferbloat.net>
>> Bcc:
>> Date: Wed, 12 Feb 2020 22:27:14 -0800
>> Subject: Re: [Make-wifi-fast] Status of the industry on over buffering at
>> the WiFi air interface
>> Internally, we have telemetry as packets move through the end/end logic
>> subsystems.  A python controller receives all the telemetry from separate
>> netlink sockets.  It also maps all the time domains, e.g., TSF, into the
>> GPS time domain.  Then one can see exactly where packets are at any moment
>> in time.  We also produce stacked bar plots for each packet latency after
>> it moves from end.  Then produce clusters from there as there are millions
>> of packets.  Typically our main goal is to show our customers we're not the
>> problem and show that it's either their os/stack or air time, things we
>> don't control. (I argue we have more control over EDCA then we'd admit,
>> late bindings, e.g. MCS rate selection, etc., and per packet adaptive EDCAs
>> seem interesting)
>>
>> This type of WiFi network telemetry isn't supported outside of internal
>> tools.  There is some movement towards inserting network telemetry inside
>> TCP headers but not much. I believe SDN guys use it inside of data
>> centers.  If it's useful, adding it to open source tooling might be doable
>> though I'd need to do some thinking about the technical details a bit.  A
>> first obstacle is figuring out a common time domain or how to provide
>> sufficient information without one.
>>
>> Something like this could help drive ECN type features - not sure.  The
>> network engineering teams are so silo'd both within orgs and across
>> companies it's hard to truly optimize end/end problems.  The OSI layering
>> model tends to get in the way too, at least from an eng silo'ing
>> perspective.
>>
>> Bob
>>
>> On Wed, Feb 12, 2020 at 5:56 PM David P. Reed <dpreed@deepplum.com> wrote:
>>
>>> I know this is hard to measure, in general. Especially to isolate the
>>> issue because it combines packet scheduling, the AP's own activity, and the
>>> insertion of excess buffering in each device's hardware and driver
>>> software.
>>>
>>> However, what I'm looking for is evidence that helps locate the problem,
>>> which of course is a "distributed scheduling and buffering" problem, unlike
>>> the simple bufferbloat we all saw in the CMTS's of DOCSIS 2.0,, ALU's LTE
>>> deployments in the early days of 4G (at ATT Wireless), or the overbuffering
>>> in Arista Networks's switches, which were quite simple to measure and
>>> diagnose.
>>>
>>> On Wednesday, February 12, 2020 7:36pm, "Bob McMahon" <
>>> bob.mcmahon@broadcom.com> said:
>>>
>>> > hmm, not sure if this helps but "excess queueing" can be hard to define.
>>> >
>>> > Do you know the operating systems for the WiFi devices and if tooling
>>> can
>>> > be loaded upon them?  iperf clients samples RTT and CWND for linux
>>> > machines. Iperf 2.0.14 (in development) has a lot of latency related
>>> > features
>>> >
>>> > Also, if there is control over the AIFS one can set that for the high
>>> rates
>>> > devices such that they always win and the lower rate ones always lose.
>>> If
>>> > that solves things it does suggest WiFi tx queues developing per the
>>> TXOP
>>> > arbitration and air transmission as an issue.  Standard cwmin/cwmax
>>> isn't
>>> > as effective though it won't allow high rates to starve low rates
>>> devices
>>> > as AIFS might (depending upon the values)
>>> >
>>> > I use latency to measure the performance and define bounds that way and
>>> > it's very specific to use cases.  IT does require clock sync. My devices
>>> > have GPS disciplined oscillators which aren't common.
>>> >
>>> > As an aside, the HULL approach of phantom queues looks interesting.
>>> > https://people.csail.mit.edu/alizadeh/papers/hull-nsdi12.pdf
>>> >
>>> > Bob
>>> >
>>> > On Wed, Feb 12, 2020 at 4:08 PM David P. Reed <dpreed@deepplum.com>
>>> wrote:
>>> >
>>> >> A friend of mine (not a network expert, but a gadget freak), has been
>>> >> deploying wireless security cameras at his home and vacation home. He
>>> uses
>>> >> a single WiFi AP in each place, serving the security cameras etc.
>>> >>
>>> >> What he observes is this:
>>> >>
>>> >> Whenever anyone on a laptop in one of the homes uploads a modest sized
>>> >> file (over the same WiFi) the security systems all lose data.
>>> >>
>>> >> Now I can't go to his home to diagnose this, but I've asked him to
>>> check
>>> >> out his cable bufferbloat using dslreports, and he gets no bufferbloat
>>> >> there. But it sure looks like *severe* lag under load is affecting the
>>> >> security camera feed to the cloud servers that the company that sells
>>> the
>>> >> security cameras provides.
>>> >>
>>> >> So, is there a way to simply *diagnose* the WiFi air link for excess
>>> >> queueing in all the high rate WiFi devices? Something a non-net-head
>>> could
>>> >> do?
>>> >>
>>> >> The situation around congestion control in the industry continues to
>>> >> royally suck, in my opinion. The vendors don't care, the ISPs don't
>>> care
>>> >> (they can sell a higher speed connection than is actually needed and
>>> >> super-fabulous MIMO gadgets that still don't quite solve the problem).
>>> >>
>>> >> I'm an old guy, basically retired. I'm sad because the young folks
>>> remain
>>> >> clueless.
>>> >>
>>> >> And it's been decades since bufferbloat was discuvered, and the basic
>>> >> issue of congestion signalling being needed. I'm sure 5G (whatever it
>>> >> really is) is not paying attention to this network level congestion
>>> issue...
>>> >>
>>> >> _______________________________________________
>>> >> Make-wifi-fast mailing list
>>> >> Make-wifi-fast@lists.bufferbloat.net
>>> >> https://lists.bufferbloat.net/listinfo/make-wifi-fast
>>> >
>>>
>>>
>>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Bob McMahon via Make-wifi-fast <make-wifi-fast@lists.bufferbloat.net
>> >
>> To: "David P. Reed" <dpreed@deepplum.com>
>> Cc: Make-Wifi-fast <make-wifi-fast@lists.bufferbloat.net>
>> Bcc:
>> Date: Wed, 12 Feb 2020 22:27:28 -0800 (PST)
>> Subject: Re: [Make-wifi-fast] Status of the industry on over buffering at
>> the WiFi air interface
>> _______________________________________________
>> Make-wifi-fast mailing list
>> Make-wifi-fast@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/make-wifi-fast
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Make-wifi-fast] Status of the industry on over buffering at the WiFi air interface
  2020-02-13 22:23         ` David P. Reed
@ 2020-02-13 22:36           ` Jonathan Morton
  2020-02-13 23:49             ` Bob McMahon
  2020-02-14 16:40             ` David P. Reed
  0 siblings, 2 replies; 9+ messages in thread
From: Jonathan Morton @ 2020-02-13 22:36 UTC (permalink / raw)
  To: David P. Reed; +Cc: Bob McMahon, Make-Wifi-fast

> On 14 Feb, 2020, at 12:23 am, David P. Reed <dpreed@deepplum.com> wrote:
> 
> The modem clearly is capable of giving congestion control signals to a directly connected Ethernet path (non-wireless), by dropping packets.

No - by sending Pause frames back.  It's an increasingly-used method of applying back pressure on an Ethernet link, in preference to dropping packets.  If it *did* drop packets, you wouldn't get an F grade for bloat.

So the Nighthawk is correctly halting Ethernet output in response to those frames (it's probably a function of the NIC hardware or driver), but exercises absolutely no control over the queue that builds up as a result.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Make-wifi-fast] Status of the industry on over buffering at the WiFi air interface
  2020-02-13 22:36           ` Jonathan Morton
@ 2020-02-13 23:49             ` Bob McMahon
  2020-02-14 16:40             ` David P. Reed
  1 sibling, 0 replies; 9+ messages in thread
From: Bob McMahon @ 2020-02-13 23:49 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: David P. Reed, Make-Wifi-fast

[-- Attachment #1: Type: text/plain, Size: 1478 bytes --]

I believe this switching path is all transistors on a very low cost chip
where no sw is involved.  I also think some nighthawks have the ability to
disable honoring the pause frames on a per port basis.

The core of such a problem seems to be a 1Gb/s switch port connected to a
25Mb/s one.  I've noticed when my relatives purchased XFINITY home security
services they significantly increased their uplink speeds all for the
security cameras.  So this may be an issue per the lack of structural
separation.
https://www.communications.gov.au/what-we-do/internet/competition-broadband/telstras-separation-framework

Back to humans getting in the way of ourselves which is all too common.

Bob

On Thu, Feb 13, 2020 at 2:36 PM Jonathan Morton <chromatix99@gmail.com>
wrote:

> > On 14 Feb, 2020, at 12:23 am, David P. Reed <dpreed@deepplum.com> wrote:
> >
> > The modem clearly is capable of giving congestion control signals to a
> directly connected Ethernet path (non-wireless), by dropping packets.
>
> No - by sending Pause frames back.  It's an increasingly-used method of
> applying back pressure on an Ethernet link, in preference to dropping
> packets.  If it *did* drop packets, you wouldn't get an F grade for bloat.
>
> So the Nighthawk is correctly halting Ethernet output in response to those
> frames (it's probably a function of the NIC hardware or driver), but
> exercises absolutely no control over the queue that builds up as a result.
>
>  - Jonathan Morton
>
>

[-- Attachment #2: Type: text/html, Size: 2058 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Make-wifi-fast] Status of the industry on over buffering at the WiFi air interface
  2020-02-13 22:36           ` Jonathan Morton
  2020-02-13 23:49             ` Bob McMahon
@ 2020-02-14 16:40             ` David P. Reed
  1 sibling, 0 replies; 9+ messages in thread
From: David P. Reed @ 2020-02-14 16:40 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Bob McMahon, Make-Wifi-fast

Wow. I didn't know Pause Frames were becoming commonly used. That's terrible in general, but I guess for a dedicated box with only one path outgoing it is OK. A pause tells a source to stop sending to the access router. It doesn't reflect any path dependency, so if the access router were actually a *router* that could feed more than one outgoing link, a pause would not be selective enough.

So this is a special case that moves the bufferbloat situation into the router.

But I really thank you, because that resolves one issue being observed. That's new information for me. So thanks again! Is there a "best practice RFC" aoout using Pause Frames in the Ethernet under IP? Or is this just random hacking by hardware vendors who don't understand the end-to-end nature of congestion management?

However, the other issue observed, which I didn't mention, is that there is a big problem on the downlink side, too, when using one of several different APs. I'm not aware of how a pause frame might be utilized by a laptop using WiFi, or even if there is a notion of Pause Frame in Windows WiFi drivers. (if there were, then "out of control" congestion would be a property of both a Netgear and a Linksys AP, both of which had this "download" lag under load.)

On Thursday, February 13, 2020 5:36pm, "Jonathan Morton" <chromatix99@gmail.com> said:

>> On 14 Feb, 2020, at 12:23 am, David P. Reed <dpreed@deepplum.com> wrote:
>>
>> The modem clearly is capable of giving congestion control signals to a directly
>> connected Ethernet path (non-wireless), by dropping packets.
> 
> No - by sending Pause frames back.  It's an increasingly-used method of applying
> back pressure on an Ethernet link, in preference to dropping packets.  If it *did*
> drop packets, you wouldn't get an F grade for bloat.
> 
> So the Nighthawk is correctly halting Ethernet output in response to those frames
> (it's probably a function of the NIC hardware or driver), but exercises absolutely
> no control over the queue that builds up as a result.
> 
>  - Jonathan Morton
> 
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-02-14 16:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-13  0:08 [Make-wifi-fast] Status of the industry on over buffering at the WiFi air interface David P. Reed
2020-02-13  0:36 ` Bob McMahon
2020-02-13  1:56   ` David P. Reed
2020-02-13  6:27     ` Bob McMahon
     [not found]     ` <mailman.471.1581575247.1241.make-wifi-fast@lists.bufferbloat.net>
2020-02-13 21:32       ` Bob McMahon
2020-02-13 22:23         ` David P. Reed
2020-02-13 22:36           ` Jonathan Morton
2020-02-13 23:49             ` Bob McMahon
2020-02-14 16:40             ` David P. Reed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox