From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 8CAF13B29E for ; Thu, 13 Feb 2020 16:32:54 -0500 (EST) Received: by mail-wm1-x334.google.com with SMTP id t14so8395357wmi.5 for ; Thu, 13 Feb 2020 13:32:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=aEa4BBN9i8zN3ELWfyaxtFAgEVeKj44GfOx603mHDzY=; b=fGOlhm9mQWcFA35RARjmRf0wA9nu25Br8RS/dTFylXa7EyAsAH8IjXct0yaYQ9hvhj 98C2pPOS1jgOx+HkZK56LtsbEdds1xKr/3HCIeO3557KLxZVDC4+mjLKSxsufK49zdCX 8IchJRE6d38wLtsDSOCB/6cKOjuHCQVPRXSlI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=aEa4BBN9i8zN3ELWfyaxtFAgEVeKj44GfOx603mHDzY=; b=TrIpqMl5oWUzTVe4p4CP6GbfWa8nC6rk06rObsSMbnt6lIYSxGCTJ0aFaxjGBvmCXp ihRR4ngE9cVlvkN2eIjedLhknGiRK5f4cHR3UJkHTfTV2jEGMKPSlIzMZ7Omm3Z03qrx uo7bvGwNXlwUpBG+UeG16TTM7bZYsexXBGb65ZmQasc6MlPUezCTZMACjQgraAf8SVxq 9v8B/CPDDOyXBERoyn6pTHx9bm/5bFarSaQG5Tshz07HqeCrkWDsEPTNyZ78no49qy2O DG3Aq/5Djs2ddGFDgp+AXrUxVQk6JypovzNvehemNjDR3HBn8z96Qsnysjr/2B+BLdd/ +zAg== X-Gm-Message-State: APjAAAXaMggTaxHEjuNp+kVW4JlQk/vsnML3UEWPjT0J71vnZx9OFgPl qQ4WIQFQCByVVuXihCArVJi8/KOjSEAKKvrzphkVcA== X-Google-Smtp-Source: APXvYqyeeRxL5z/adxNN4x+2IM6ECNmr3KNWZwMmIhAD6j4xRD1Z/jdHkwcfczRIRBoEYNWsAdJa3XIqlReP+yEjyd4= X-Received: by 2002:a1c:dfd6:: with SMTP id w205mr41392wmg.151.1581629573575; Thu, 13 Feb 2020 13:32:53 -0800 (PST) MIME-Version: 1.0 References: <1581552513.586428831@apps.rackspace.com> <1581559003.730714516@apps.rackspace.com> In-Reply-To: From: Bob McMahon Date: Thu, 13 Feb 2020 13:32:42 -0800 Message-ID: Subject: Re: [Make-wifi-fast] Status of the industry on over buffering at the WiFi air interface To: Bob McMahon Cc: "David P. Reed" , Make-Wifi-fast Content-Type: multipart/alternative; boundary="000000000000e6a58f059e7bd46b" X-List-Received-Date: Thu, 13 Feb 2020 21:32:54 -0000 --000000000000e6a58f059e7bd46b Content-Type: text/plain; charset="UTF-8" Just a paper on inband telemetry for those that don't already know about it. Broadcom has a proprietary version for data center semiconductor products. I don't know of anything that is end/end including the WiFi access hops. https://p4.org/assets/INT-current-spec.pdf Bob On Wed, Feb 12, 2020 at 10:27 PM Bob McMahon via Make-wifi-fast < make-wifi-fast@lists.bufferbloat.net> wrote: > > > > ---------- Forwarded message ---------- > From: Bob McMahon > To: "David P. Reed" > Cc: Make-Wifi-fast > Bcc: > Date: Wed, 12 Feb 2020 22:27:14 -0800 > Subject: Re: [Make-wifi-fast] Status of the industry on over buffering at > the WiFi air interface > Internally, we have telemetry as packets move through the end/end logic > subsystems. A python controller receives all the telemetry from separate > netlink sockets. It also maps all the time domains, e.g., TSF, into the > GPS time domain. Then one can see exactly where packets are at any moment > in time. We also produce stacked bar plots for each packet latency after > it moves from end. Then produce clusters from there as there are millions > of packets. Typically our main goal is to show our customers we're not the > problem and show that it's either their os/stack or air time, things we > don't control. (I argue we have more control over EDCA then we'd admit, > late bindings, e.g. MCS rate selection, etc., and per packet adaptive EDCAs > seem interesting) > > This type of WiFi network telemetry isn't supported outside of internal > tools. There is some movement towards inserting network telemetry inside > TCP headers but not much. I believe SDN guys use it inside of data > centers. If it's useful, adding it to open source tooling might be doable > though I'd need to do some thinking about the technical details a bit. A > first obstacle is figuring out a common time domain or how to provide > sufficient information without one. > > Something like this could help drive ECN type features - not sure. The > network engineering teams are so silo'd both within orgs and across > companies it's hard to truly optimize end/end problems. The OSI layering > model tends to get in the way too, at least from an eng silo'ing > perspective. > > Bob > > On Wed, Feb 12, 2020 at 5:56 PM David P. Reed wrote: > >> I know this is hard to measure, in general. Especially to isolate the >> issue because it combines packet scheduling, the AP's own activity, and the >> insertion of excess buffering in each device's hardware and driver >> software. >> >> However, what I'm looking for is evidence that helps locate the problem, >> which of course is a "distributed scheduling and buffering" problem, unlike >> the simple bufferbloat we all saw in the CMTS's of DOCSIS 2.0,, ALU's LTE >> deployments in the early days of 4G (at ATT Wireless), or the overbuffering >> in Arista Networks's switches, which were quite simple to measure and >> diagnose. >> >> On Wednesday, February 12, 2020 7:36pm, "Bob McMahon" < >> bob.mcmahon@broadcom.com> said: >> >> > hmm, not sure if this helps but "excess queueing" can be hard to define. >> > >> > Do you know the operating systems for the WiFi devices and if tooling >> can >> > be loaded upon them? iperf clients samples RTT and CWND for linux >> > machines. Iperf 2.0.14 (in development) has a lot of latency related >> > features >> > >> > Also, if there is control over the AIFS one can set that for the high >> rates >> > devices such that they always win and the lower rate ones always lose. >> If >> > that solves things it does suggest WiFi tx queues developing per the >> TXOP >> > arbitration and air transmission as an issue. Standard cwmin/cwmax >> isn't >> > as effective though it won't allow high rates to starve low rates >> devices >> > as AIFS might (depending upon the values) >> > >> > I use latency to measure the performance and define bounds that way and >> > it's very specific to use cases. IT does require clock sync. My devices >> > have GPS disciplined oscillators which aren't common. >> > >> > As an aside, the HULL approach of phantom queues looks interesting. >> > https://people.csail.mit.edu/alizadeh/papers/hull-nsdi12.pdf >> > >> > Bob >> > >> > On Wed, Feb 12, 2020 at 4:08 PM David P. Reed >> wrote: >> > >> >> A friend of mine (not a network expert, but a gadget freak), has been >> >> deploying wireless security cameras at his home and vacation home. He >> uses >> >> a single WiFi AP in each place, serving the security cameras etc. >> >> >> >> What he observes is this: >> >> >> >> Whenever anyone on a laptop in one of the homes uploads a modest sized >> >> file (over the same WiFi) the security systems all lose data. >> >> >> >> Now I can't go to his home to diagnose this, but I've asked him to >> check >> >> out his cable bufferbloat using dslreports, and he gets no bufferbloat >> >> there. But it sure looks like *severe* lag under load is affecting the >> >> security camera feed to the cloud servers that the company that sells >> the >> >> security cameras provides. >> >> >> >> So, is there a way to simply *diagnose* the WiFi air link for excess >> >> queueing in all the high rate WiFi devices? Something a non-net-head >> could >> >> do? >> >> >> >> The situation around congestion control in the industry continues to >> >> royally suck, in my opinion. The vendors don't care, the ISPs don't >> care >> >> (they can sell a higher speed connection than is actually needed and >> >> super-fabulous MIMO gadgets that still don't quite solve the problem). >> >> >> >> I'm an old guy, basically retired. I'm sad because the young folks >> remain >> >> clueless. >> >> >> >> And it's been decades since bufferbloat was discuvered, and the basic >> >> issue of congestion signalling being needed. I'm sure 5G (whatever it >> >> really is) is not paying attention to this network level congestion >> issue... >> >> >> >> _______________________________________________ >> >> Make-wifi-fast mailing list >> >> Make-wifi-fast@lists.bufferbloat.net >> >> https://lists.bufferbloat.net/listinfo/make-wifi-fast >> > >> >> >> > > > ---------- Forwarded message ---------- > From: Bob McMahon via Make-wifi-fast > > To: "David P. Reed" > Cc: Make-Wifi-fast > Bcc: > Date: Wed, 12 Feb 2020 22:27:28 -0800 (PST) > Subject: Re: [Make-wifi-fast] Status of the industry on over buffering at > the WiFi air interface > _______________________________________________ > Make-wifi-fast mailing list > Make-wifi-fast@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/make-wifi-fast --000000000000e6a58f059e7bd46b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Just a paper on inband telemetry for those that don't = already know about it. Broadcom has a proprietary=C2=A0version for data cen= ter semiconductor products.=C2=A0 I don't know of anything that is end/= end including the WiFi access hops.

https://p4.org/assets/INT-current-spec.pdf=C2=A0=

Bob=C2=A0

On Wed, Feb 12, 2020 at 10:27 PM Bob McMahon via Make-= wifi-fast <make-= wifi-fast@lists.bufferbloat.net> wrote:



---------- Forwarded message -= ---------
From:=C2=A0Bob McMahon <bob.mcmahon@broadcom.com>
To:=C2=A0&qu= ot;David P. Reed" <dpreed@deepplum.com>
Cc:=C2=A0Make-Wifi-fast <make-wif= i-fast@lists.bufferbloat.net>
Bcc:=C2=A0
Date:=C2=A0Wed, 12 Fe= b 2020 22:27:14 -0800
Subject:=C2=A0Re: [Make-wifi-fast] Status of the i= ndustry on over buffering at the WiFi air interface
Int= ernally, we have telemetry as packets move through the end/end logic subsys= tems.=C2=A0 A python controller receives all the telemetry from separate ne= tlink sockets.=C2=A0 It also maps all the time domains, e.g., TSF, into the= GPS time domain.=C2=A0 Then one can see exactly where=C2=A0packets are at = any moment in time.=C2=A0 We also produce stacked bar plots for each packet= latency after it moves from end.=C2=A0 Then produce clusters from there as= there are millions of packets.=C2=A0 Typically our main goal is to show ou= r customers we're not the problem and show that it's either their o= s/stack or air time, things we don't control. (I argue we have more con= trol over EDCA then=C2=A0we'd admit, late bindings, e.g. MCS rate selec= tion, etc., and per packet adaptive EDCAs seem interesting)

This typ= e of WiFi network telemetry=C2=A0isn't supported outside of internal to= ols.=C2=A0 There is some movement towards inserting network telemetry insid= e TCP headers but not much. I believe SDN guys use it inside of data center= s.=C2=A0 If it's useful, adding it to open source tooling might be doab= le though I'd need to do some thinking=C2=A0about the technical details= a bit.=C2=A0 A first obstacle is figuring out a common time domain or how = to provide sufficient information without one.

Something like this c= ould help drive ECN type features - not sure.=C2=A0 The network engineering= teams are so silo'd=C2=A0both within orgs and across companies it'= s hard to truly optimize end/end problems.=C2=A0 The OSI layering model ten= ds to get in the way too, at least from an eng silo'ing perspective.=C2= =A0

Bob

On Wed, Feb 12, 2020 at 5:56 PM David P. Reed <dpreed@deepplum.com> wrote:
I k= now this is hard to measure, in general. Especially to isolate the issue be= cause it combines packet scheduling, the AP's own activity, and the ins= ertion of excess buffering in each device's hardware and driver softwar= e.

However, what I'm looking for is evidence that helps locate the problem= , which of course is a "distributed scheduling and buffering" pro= blem, unlike the simple bufferbloat we all saw in the CMTS's of DOCSIS = 2.0,, ALU's LTE deployments in the early days of 4G (at ATT Wireless), = or the overbuffering in Arista Networks's switches, which were quite si= mple to measure and diagnose.

On Wednesday, February 12, 2020 7:36pm, "Bob McMahon" <
bob.mcmahon@broadcom= .com> said:

> hmm, not sure if this helps but "excess queueing" can be har= d to define.
>
> Do you know the operating systems for the WiFi devices and if tooling = can
> be loaded upon them?=C2=A0 iperf clients samples RTT and CWND for linu= x
> machines. Iperf 2.0.14 (in development) has a lot of latency related > features
>
> Also, if there is control over the AIFS one can set that for the high = rates
> devices such that they always win and the lower rate ones always lose.= =C2=A0 If
> that solves things it does suggest WiFi tx queues developing per the T= XOP
> arbitration and air transmission as an issue.=C2=A0 Standard cwmin/cwm= ax isn't
> as effective though it won't allow high rates to starve low rates = devices
> as AIFS might (depending upon the values)
>
> I use latency to measure the performance and define bounds that way an= d
> it's very specific to use cases.=C2=A0 IT does require clock sync.= My devices
> have GPS disciplined oscillators which aren't common.
>
> As an aside, the HULL approach of phantom queues looks interesting. > https://people.csail.mit.edu/alizad= eh/papers/hull-nsdi12.pdf
>
> Bob
>
> On Wed, Feb 12, 2020 at 4:08 PM David P. Reed <dpreed@deepplum.com> wrote:
>
>> A friend of mine (not a network expert, but a gadget freak), has b= een
>> deploying wireless security cameras at his home and vacation home.= He uses
>> a single WiFi AP in each place, serving the security cameras etc.<= br> >>
>> What he observes is this:
>>
>> Whenever anyone on a laptop in one of the homes uploads a modest s= ized
>> file (over the same WiFi) the security systems all lose data.
>>
>> Now I can't go to his home to diagnose this, but I've aske= d him to check
>> out his cable bufferbloat using dslreports, and he gets no bufferb= loat
>> there. But it sure looks like *severe* lag under load is affecting= the
>> security camera feed to the cloud servers that the company that se= lls the
>> security cameras provides.
>>
>> So, is there a way to simply *diagnose* the WiFi air link for exce= ss
>> queueing in all the high rate WiFi devices? Something a non-net-he= ad could
>> do?
>>
>> The situation around congestion control in the industry continues = to
>> royally suck, in my opinion. The vendors don't care, the ISPs = don't care
>> (they can sell a higher speed connection than is actually needed a= nd
>> super-fabulous MIMO gadgets that still don't quite solve the p= roblem).
>>
>> I'm an old guy, basically retired. I'm sad because the you= ng folks remain
>> clueless.
>>
>> And it's been decades since bufferbloat was discuvered, and th= e basic
>> issue of congestion signalling being needed. I'm sure 5G (what= ever it
>> really is) is not paying attention to this network level congestio= n issue...
>>
>> _______________________________________________
>> Make-wifi-fast mailing list
>> Make-wifi-fast@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo= /make-wifi-fast
>





---------- Forwarded message ----------
From:=C2=A0Bob McMah= on via Make-wifi-fast <make-wifi-fast@lists.bufferbloat.net>
To= :=C2=A0"David P. Reed" <dpreed@deepplum.com>
Cc:=C2=A0Make-Wifi-fast &= lt;make-wifi-fast@lists.bufferbloat.net>
Bcc:=C2=A0
Date:=C2=A0= Wed, 12 Feb 2020 22:27:28 -0800 (PST)
Subject:=C2=A0Re: [Make-wifi-fast]= Status of the industry on over buffering at the WiFi air interface
____= ___________________________________________
Make-wifi-fast mailing list
M= ake-wifi-fast@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/make-wif= i-fast
--000000000000e6a58f059e7bd46b--