From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp96.iad3a.emailsrvr.com (smtp96.iad3a.emailsrvr.com [173.203.187.96]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id B488A3B29E for ; Thu, 13 Feb 2020 17:23:21 -0500 (EST) Received: from app21.wa-webapps.iad3a (relay-webapps.rsapps.net [172.27.255.140]) by smtp29.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 6D84624CFE; Thu, 13 Feb 2020 17:23:21 -0500 (EST) X-Sender-Id: dpreed@deepplum.com Received: from app21.wa-webapps.iad3a (relay-webapps.rsapps.net [172.27.255.140]) by 0.0.0.0:25 (trex/5.7.12); Thu, 13 Feb 2020 17:23:21 -0500 Received: from deepplum.com (localhost.localdomain [127.0.0.1]) by app21.wa-webapps.iad3a (Postfix) with ESMTP id 55D1F60B64; Thu, 13 Feb 2020 17:23:21 -0500 (EST) Received: by apps.rackspace.com (Authenticated sender: dpreed@deepplum.com, from: dpreed@deepplum.com) with HTTP; Thu, 13 Feb 2020 17:23:21 -0500 (EST) X-Auth-ID: dpreed@deepplum.com Date: Thu, 13 Feb 2020 17:23:21 -0500 (EST) From: "David P. Reed" To: "Bob McMahon" Cc: "Bob McMahon" , "Make-Wifi-fast" MIME-Version: 1.0 Content-Type: text/plain;charset=UTF-8 Content-Transfer-Encoding: quoted-printable Importance: Normal X-Priority: 3 (Normal) X-Type: plain In-Reply-To: References: <1581552513.586428831@apps.rackspace.com> <1581559003.730714516@apps.rackspace.com> Message-ID: <1581632601.347810479@apps.rackspace.com> X-Mailer: webmail/17.2.8-RC Subject: Re: [Make-wifi-fast] Status of the industry on over buffering at the WiFi air interface X-BeenThere: make-wifi-fast@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Feb 2020 22:23:21 -0000 More interesting anecdotal information. My friend did some simple dslreport= s speed tests with hi Netgear NightHawk access point and directly wired to = his cable modem.=0A=0AWhat he observed is interesting: directly connecting = his laptop to the cable modem (DOCSIS 3.1, but limited in downlink and upli= nk speeds to 250/25 Mb/sec), and then connecting over the NightHawk wired t= o the cable modem.=0A=0Adslreports gives "lag under load" statistics with b= oth a "letter grade" and a bar graph showing the range of packet delays to = the various servers. What jumped out was this:=0A=0A1) directly connected t= o cable modem/router via a GigE cable, speeds were as expected, and lag-un= der-load got an A+. No bufferbloat in the ISP or cable link detected.=0A= =0A2) going indirect through the NightHawk AP, the speeds are not surprisin= g (802.11ac can definitely fill the uplink of the cable modem at 25 Mb/s). = BUT... a letter grade of "F" on bufferbloat, and the numbers for "lag under= load" indicate a variable value going from 2000 msec. up to 5000 msec.=0A= =0AHmm... what is causing this? Well, it's pretty unlikely the delay is in= the test computer's buffers - they work fine at full uplink capacity of th= e cable modem (25 Mb/sec), so the software stack on the test computer, an H= P laptop running Windows, is unlikely the problem. And there's no evidence= of the problem being off premises.=0A=0ASo we are left with only one possi= ble result, says Sherlock, after considering alternatives. A queue in the N= ightHawk AP feeding the ethernet link into the cable modem is a big problem= . What queue might that be? A queue in the WiFi hardware or driver? Or the = outbound queue feeding the Ethernet link to the router?=0A=0AWell, I can't = prove this, but consider that the 802.11ac link data rate can sustain much = more than 25 MB/sec, so the "bottleneck" link here is the limited uplink ca= pacity at the Modem.=0A=0AThe modem clearly is capable of giving congestion= control signals to a directly connected Ethernet path (non-wireless), by d= ropping packets.=0A=0ASo what is going on here?=0A=0AI admit I'm a bit puzz= led, but not completely. If there is a queue building up in the NightHawk A= P, it's possible that its queues are building up indirectly because of some= interaction that makes the HP laptop fail to understand it is not getting = TCP ACKs, and instead deciding the physical RTT end-to-end is getting longe= r, so that it starts behaving like the end-to-end physical RTT is multiple = seconds long.=0A=0AThat could be a problem with Microsoft's proprietary Win= dows TCP stack. It may have a really dumb or not thought through RTT estima= tor in it.=0A=0ABut the thing that doesn't fit that picture quite right is = that the queue seems to be growing to multiple seconds in the NightHawk AP,= or possibly in the Windows 802.11 device driver due to some weird interact= ion.=0A=0AAnyway, this is really interesting. I don't know what to recomme= nd to my frient (other than trying some other combinations - a Linux laptop= , an OpenWRT AP or both).=0A=0ABUt my observation that the "Industry status= " continues to seem to suck on congestion in consumer facing gear seems to = be verified. And there's no easy to use tool out there that can say "the pr= oblem is *in this component*" (and the Best Buy guys are happy to encourag= e looking for "radio interference" as if that has anything to do with 5 sec= onds of queueing building up somewhere.=0A=0A=0AOn Thursday, February 13, 2= 020 4:32pm, "Bob McMahon" said:=0A=0A> Just a pa= per on inband telemetry for those that don't already know about=0A> it. Bro= adcom has a proprietary version for data center semiconductor=0A> products.= I don't know of anything that is end/end including the WiFi=0A> access ho= ps.=0A> =0A> https://p4.org/assets/INT-current-spec.pdf=0A> =0A> Bob=0A> = =0A> On Wed, Feb 12, 2020 at 10:27 PM Bob McMahon via Make-wifi-fast <=0A> = make-wifi-fast@lists.bufferbloat.net> wrote:=0A> =0A>>=0A>>=0A>>=0A>> -----= ----- Forwarded message ----------=0A>> From: Bob McMahon =0A>> To: "David P. Reed" =0A>> Cc: Make-Wifi= -fast =0A>> Bcc:=0A>> Date: Wed, 12 F= eb 2020 22:27:14 -0800=0A>> Subject: Re: [Make-wifi-fast] Status of the ind= ustry on over buffering at=0A>> the WiFi air interface=0A>> Internally, we = have telemetry as packets move through the end/end logic=0A>> subsystems. = A python controller receives all the telemetry from separate=0A>> netlink s= ockets. It also maps all the time domains, e.g., TSF, into the=0A>> GPS ti= me domain. Then one can see exactly where packets are at any moment=0A>> i= n time. We also produce stacked bar plots for each packet latency after=0A= >> it moves from end. Then produce clusters from there as there are millio= ns=0A>> of packets. Typically our main goal is to show our customers we're= not the=0A>> problem and show that it's either their os/stack or air time,= things we=0A>> don't control. (I argue we have more control over EDCA then= we'd admit,=0A>> late bindings, e.g. MCS rate selection, etc., and per pac= ket adaptive EDCAs=0A>> seem interesting)=0A>>=0A>> This type of WiFi netwo= rk telemetry isn't supported outside of internal=0A>> tools. There is some= movement towards inserting network telemetry inside=0A>> TCP headers but n= ot much. I believe SDN guys use it inside of data=0A>> centers. If it's us= eful, adding it to open source tooling might be doable=0A>> though I'd need= to do some thinking about the technical details a bit. A=0A>> first obsta= cle is figuring out a common time domain or how to provide=0A>> sufficient = information without one.=0A>>=0A>> Something like this could help drive ECN= type features - not sure. The=0A>> network engineering teams are so silo'= d both within orgs and across=0A>> companies it's hard to truly optimize en= d/end problems. The OSI layering=0A>> model tends to get in the way too, a= t least from an eng silo'ing=0A>> perspective.=0A>>=0A>> Bob=0A>>=0A>> On W= ed, Feb 12, 2020 at 5:56 PM David P. Reed wrote:=0A>>= =0A>>> I know this is hard to measure, in general. Especially to isolate th= e=0A>>> issue because it combines packet scheduling, the AP's own activity,= and the=0A>>> insertion of excess buffering in each device's hardware and = driver=0A>>> software.=0A>>>=0A>>> However, what I'm looking for is evidenc= e that helps locate the problem,=0A>>> which of course is a "distributed sc= heduling and buffering" problem, unlike=0A>>> the simple bufferbloat we all= saw in the CMTS's of DOCSIS 2.0,, ALU's LTE=0A>>> deployments in the early= days of 4G (at ATT Wireless), or the overbuffering=0A>>> in Arista Network= s's switches, which were quite simple to measure and=0A>>> diagnose.=0A>>>= =0A>>> On Wednesday, February 12, 2020 7:36pm, "Bob McMahon" <=0A>>> bob.mc= mahon@broadcom.com> said:=0A>>>=0A>>> > hmm, not sure if this helps but "ex= cess queueing" can be hard to define.=0A>>> >=0A>>> > Do you know the opera= ting systems for the WiFi devices and if tooling=0A>>> can=0A>>> > be loade= d upon them? iperf clients samples RTT and CWND for linux=0A>>> > machines= . Iperf 2.0.14 (in development) has a lot of latency related=0A>>> > featur= es=0A>>> >=0A>>> > Also, if there is control over the AIFS one can set that= for the high=0A>>> rates=0A>>> > devices such that they always win and the= lower rate ones always lose.=0A>>> If=0A>>> > that solves things it does s= uggest WiFi tx queues developing per the=0A>>> TXOP=0A>>> > arbitration and= air transmission as an issue. Standard cwmin/cwmax=0A>>> isn't=0A>>> > as= effective though it won't allow high rates to starve low rates=0A>>> devic= es=0A>>> > as AIFS might (depending upon the values)=0A>>> >=0A>>> > I use = latency to measure the performance and define bounds that way and=0A>>> > i= t's very specific to use cases. IT does require clock sync. My devices=0A>= >> > have GPS disciplined oscillators which aren't common.=0A>>> >=0A>>> > = As an aside, the HULL approach of phantom queues looks interesting.=0A>>> >= https://people.csail.mit.edu/alizadeh/papers/hull-nsdi12.pdf=0A>>> >=0A>>>= > Bob=0A>>> >=0A>>> > On Wed, Feb 12, 2020 at 4:08 PM David P. Reed =0A>>> wrote:=0A>>> >=0A>>> >> A friend of mine (not a netwo= rk expert, but a gadget freak), has been=0A>>> >> deploying wireless securi= ty cameras at his home and vacation home. He=0A>>> uses=0A>>> >> a single W= iFi AP in each place, serving the security cameras etc.=0A>>> >>=0A>>> >> W= hat he observes is this:=0A>>> >>=0A>>> >> Whenever anyone on a laptop in o= ne of the homes uploads a modest sized=0A>>> >> file (over the same WiFi) t= he security systems all lose data.=0A>>> >>=0A>>> >> Now I can't go to his = home to diagnose this, but I've asked him to=0A>>> check=0A>>> >> out his c= able bufferbloat using dslreports, and he gets no bufferbloat=0A>>> >> ther= e. But it sure looks like *severe* lag under load is affecting the=0A>>> >>= security camera feed to the cloud servers that the company that sells=0A>>= > the=0A>>> >> security cameras provides.=0A>>> >>=0A>>> >> So, is there a = way to simply *diagnose* the WiFi air link for excess=0A>>> >> queueing in = all the high rate WiFi devices? Something a non-net-head=0A>>> could=0A>>> = >> do?=0A>>> >>=0A>>> >> The situation around congestion control in the ind= ustry continues to=0A>>> >> royally suck, in my opinion. The vendors don't = care, the ISPs don't=0A>>> care=0A>>> >> (they can sell a higher speed conn= ection than is actually needed and=0A>>> >> super-fabulous MIMO gadgets tha= t still don't quite solve the problem).=0A>>> >>=0A>>> >> I'm an old guy, b= asically retired. I'm sad because the young folks=0A>>> remain=0A>>> >> clu= eless.=0A>>> >>=0A>>> >> And it's been decades since bufferbloat was discuv= ered, and the basic=0A>>> >> issue of congestion signalling being needed. I= 'm sure 5G (whatever it=0A>>> >> really is) is not paying attention to this= network level congestion=0A>>> issue...=0A>>> >>=0A>>> >> ________________= _______________________________=0A>>> >> Make-wifi-fast mailing list=0A>>> = >> Make-wifi-fast@lists.bufferbloat.net=0A>>> >> https://lists.bufferbloat.= net/listinfo/make-wifi-fast=0A>>> >=0A>>>=0A>>>=0A>>>=0A>>=0A>>=0A>> ------= ---- Forwarded message ----------=0A>> From: Bob McMahon via Make-wifi-fast= > >=0A>> To: "David P. Reed" =0A>> Cc: Make-Wifi-fast =0A>> Bcc:=0A>> Date: Wed, 12 Feb 2020 22:27:28 -0800 (PST)=0A>> Subje= ct: Re: [Make-wifi-fast] Status of the industry on over buffering at=0A>> t= he WiFi air interface=0A>> _______________________________________________= =0A>> Make-wifi-fast mailing list=0A>> Make-wifi-fast@lists.bufferbloat.net= =0A>> https://lists.bufferbloat.net/listinfo/make-wifi-fast=0A> =0A