[Make-wifi-fast] SmallNetBuilder article: Does OFDMA Really Work?

Fri May 15 17:35:06 EDT 2020

I'm in alignment with Dave's and Toke's posts. I do disagree somewhat with:

>I'd say take latency measurements when the input rates are below the
service rates.

That is ridiculous. It requires an oracle. It requires a belief system
where users will never exceed your mysterious parameters.

What zero-queue or low-queue latency measurements provide is a top end or
best performance, even when monitoring the tail of that CDF.   Things like
AR gaming are driving WiFi "ultra low latency" requirements where
phase/spatial stream matter.  How well an algorithm detects 1 to 2 spatial
streams is starting to matter. 2->1 stream is a relatively easy decision.
Then there is the 802.11ax AP scheduling vs EDCA which is a very difficult
engineering problem but sorely needed.

A major issue as a WiFi QA engineer is how to measure a multivariate system
in a meaningful (and automated) way. Easier said than done. (I find trying
to present Mahalanobis distances
<https://en.wikipedia.org/wiki/Mahalanobis_distance>doesn't work well
especially when compared to a scalar or single number.)  The first scalar
relied upon too much is peak average throughput, particularly without
concern for bloat. This was a huge flaw by the industry as bloat was
inserted everywhere by most everyone providing little to no benefit -
actually a design flaw per energy, transistors, etc.  Engineers attacking
bloat has been a very good thing by my judgment.

Some divide peak average throughput by latency to "get network power"  But
then there is "bloat latency" vs. "service latency."  Note:  with iperf
2.0.14 it's easy to see the difference by using socket read or write rate
limiting. If the link is read rate limited (with -b on the server) the
bloat is going to be exacerbated per a read congestion point.  If it's
write rate limited, -b on the client, the queues shouldn't be in a standing
state.

And then of course, "real world" class of measurements are very hard. And a
chip is usually powered by a battery so energy per useful xfer bit matters
too.

So parameters can be seen as mysterious for sure. Figuring out how to
demystify can be the fun part ;)

Bob

On Fri, May 15, 2020 at 1:30 PM Dave Taht <dave.taht at gmail.com> wrote:

> On Fri, May 15, 2020 at 12:50 PM Tim Higgins <tim at smallnetbuilder.com>
> wrote:
> >
> > Thanks for the additional insights, Bob. How do you measure TCP connects?
> >
> > Does Dave or anyone else on the bufferbloat team want to comment on
> Bob's comment that latency testing under "heavy traffic" isn't ideal?
>
> I hit save before deciding to reply.
>
> > My impression is that the rtt_fair_var test I used in the article and
> other RRUL-related Flent tests fully load the connection under test. Am I
> incorrect?
>
> well, to whatever extent possible by other limits in the hardware.
> Under loads like these, other things - such as the rx path, or cpu,
> start to fail. I had one box that had a memory leak, overnight testing
> like this, showed it up. Another test - with ipv6 - ultimately showed
> serious ipv6 traffic was causing a performance sucking cpu trap.
> Another test showed IPv6 being seriously outcompeted by ipv4 because
> there was 4096 ipv4 flow offloads in the hardware, and only 64 for ipv6....
>
> There are many other tests in the suite - testing a fully loaded
> station while other stations are moping along... stuff near and far
> away (ATF),
>
>
> >
> > ===
> > On 5/15/2020 3:36 PM, Bob McMahon wrote:
> >
> > Latency testing under "heavy traffic" isn't ideal.
>
> Of course not. But in any real time control system, retaining control
> and degrading predictably under load, is a hard requirement
> in most other industries besides networking. Imagine if you only
> tested your car, at speeds no more than 55mph, on roads that were
> never slippery and with curves never exceeding 6 degrees. Then shipped
> it, without a performance governor, and rubber bands holding
> the steering wheel on that would break at 65mph, and with tires that
> only worked at those speeds on those kind of curves.
>
> To stick with the heavy traffic analogy, but in a slower case... I
> used to have a car that overheated in heavy stop and go traffic.
> Eventually, it caught on fire. (The full story is really funny,
> because I was naked at the time, but I'll save it for a posthumous
> biography)
>
> > If the input rate exceeds the service rate of any queue for any period
> of time the queue fills up and latency hits a worst case per that queue
> depth.
>
> which is what we're all about managing well, and predictably, here at
> bufferbloat.net
>
> >I'd say take latency measurements when the input rates are below the
> service rates.
>
> That is ridiculous. It requires an oracle. It requires a belief system
> where users will never exceed your mysterious parameters.
>
> > The measurements when service rates are less than input rates are less
> about latency and more about bloat.
>
> I have to note that latency measurements are certainly useful on less
> loaded networks. Getting an AP out of sleep state is a good one,
> another is how fast can you switch stations, under a minimal (say,
> voip mostly) load, in the presence of interference.
>
> > Also, a good paper is this one on trading bandwidth for ultra low
> latency using phantom queues and ECN.
>
> I'm burned out on ecn today. on the high end I rather like cisco's AFD...
>
>
> https://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/white-paper-c11-738488.html
>
> > Another thing to consider is that network engineers tend to have a
> mioptic view of latency.  The queueing or delay between the socket
> writes/reads and network stack matters too.
>
> It certainly does! I'm always giving a long list of everything we've
> done to improve the linux stack from app to endpoint.
>
> Over on reddit recently (can't find the link) I talked about how bad
> the linux ethernet stack was, pre-bql. I don't think anyone in the
> industry
> really understood deeply, the effects of packet aggregation in the
> multistation case, for wifi. (I'm still unsure if anyone does!). Also
> endless retries starving out other stations is huge problem in wifi,
> and lte, and is going to become more of one on cable...
>
> We've worked on tons of things - like tcp_lowat, fq, and queuing in
> general - jeeze -
> https://conferences.sigcomm.org/sigcomm/2014/doc/slides/137.pdf
> See the slide on smashing latency everywhere in the stack.
>
> And I certainly, now that I can regularly get fiber down below 2ms,
> regard the overhead of opus (2.7ms at the higest sampling rate) a real
> problem,
> along with scheduling delay and jitter in the os in the jamophone
> project. It pays to bypass the OS, when you can.
>
> Latency is everywhere, and you have to tackle it, everywhere, but it
> helps to focus on what ever is costing you the most latency at a time,
> re:
>
> https://en.wikipedia.org/wiki/Gustafson%27s_law
>
> My biggest complaint nowadays about modern cpu architectures is that
> they can't context switch faster than a few thousand cycles. I've
> advocated that folk look over mill computer's design, which can do it in 5.
>
> >Network engineers focus on packets or TCP RTTs and somewhat overlook a
> user's true end to end experience.
>
> Heh. I don't. Despite all I say here (Because I viewed the network as
> the biggest problem 10 years ago), I have been doing voip and
> videoconferencing apps for over 25 years, and basic benchmarks like
> eye to eye/ear ear delay and jitter I have always hoped more used.
>
> >  Avoiding bloat by slowing down the writes, e.g. ECN or different
> scheduling, still contributes to end/end latency between the writes() and
> the reads() that too few test for and monitor.
>
> I agree that iperf had issues. I hope they are fixed now.
>
> >
> > Note: We're moving to trip times of writes to reads (or frames for
> video) for our testing.
>
> ear to ear or eye to eye delay measurements are GOOD. And a lot of
> that delay is still in the stack. One day, perhaps
> we can go back to scan lines and not complicated encodings.
>
> >We are also replacing/supplementing pings with TCP connects as other
> "latency related" measurements. TCP connects are more important than ping.
>
> I wish more folk measured dns lookup delay...
>
> Given the prevalance of ssl, I'd be measuring not just the 3whs, but
> that additional set of handshakes.
>
> We do have a bunch of http oriented tests in the flent suite, as well
> as for voip. At the time we were developing it,
> though, videoconferncing was in its infancy and difficult to model, so
> we tended towards using what flows we could get
> from real servers and services. I think we now have tools to model
> videoconferencing traffic much better today than
> we could, but until now, it wasn't much of a priority.
>
> It's also important to note that videoconferencing and gaming traffic
> put a very different load on the network - very sensitive to jitter,
> not so sensitive to loss. Both are VERY low bandwidth compared to tcp
> - gaming is 35kbit/sec for example, on 10 or 20ms intervals.
>
> >
> > Bob
> >
> > On Fri, May 15, 2020 at 8:20 AM Tim Higgins <tim at smallnetbuilder.com>
> wrote:
> >>
> >> Hi Bob,
> >>
> >> Thanks for your comments and feedback. Responses below:
> >>
> >> On 5/14/2020 5:42 PM, Bob McMahon wrote:
> >>
> >> Also, forgot to mention, for latency don't rely on average as most
> don't care about that.  Maybe use the upper 3 stdev, i.e. the 99.97%
> point.  Our latency runs will repeat 20 seconds worth of packets and find
> that then calculate CDFs of this point in the tail across hundreds of runs
> under different conditions. One "slow packet" is all that it takes to screw
> up user experience when it comes to latency.
> >>
> >> Thanks for the guidance.
> >>
> >>
> >> On Thu, May 14, 2020 at 2:38 PM Bob McMahon <bob.mcmahon at broadcom.com>
> wrote:
> >>>
> >>> I haven't looked closely at OFDMA but these latency numbers seem way
> too high for it to matter.  Why is the latency so high?  It suggests there
> may be queueing delay (bloat) unrelated to media access.
> >>>
> >>> Also, one aspect is that OFDMA is replacing EDCA with AP scheduling
> per trigger frame.  EDCA kinda sucks per listen before talk which is about
> 100 microseconds on average which has to be paid even when no energy
> detect.  This limits the transmits per second performance to 10K
> (1/0.0001.). Also remember that WiFi aggregates so transmissions have
> multiple packets and long transmits will consume those 10K tx ops. One way
> to get around aggregation is to use voice (VO) access class which many
> devices won't aggregate (mileage will vary.). Then take a packets per
> second measurement with small packets.  This would give an idea on the
> frame scheduling being AP based vs EDCA.
> >>>
> >>> Also, measuring ping time as a proxy for latency isn't ideal. Better
> to measure trip times of the actual traffic.  This requires clock sync to a
> common reference. GPS atomic clocks are available but it does take some
> setup work.
> >>>
> >>> I haven't thought about RU optimizations and that testing so can't
> really comment there.
> >>>
> >>> Also, I'd consider replacing the mechanical turn table with variable
> phase shifters and set them in the MIMO (or H-Matrix) path.  I use model
> 8421 from Aeroflex. Others make them too.
> >>>
> >> Thanks again for the suggestions. I agree latency is very high when I
> remove the traffic bandwidth caps. I don't know why. One of the key
> questions I've had since starting to mess with OFDMA is whether it helps
> under light or heavy traffic load. All I do know is that things go to hell
> when you load the channel. And RRUL test methods essentially break OFDMA.
> >>
> >> I agree using ping isn't ideal. But I'm approaching this as creating a
> test that a consumer audience can understand. Ping is something consumers
> care about and understand.  The octoScope STApals are all ntp sync'd and
> latency measurements using iperf have been done by them.
> >>
> >>
> >
> > _______________________________________________
> > Make-wifi-fast mailing list
> > Make-wifi-fast at lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/make-wifi-fast
>
>
>
> --
> "For a successful technology, reality must take precedence over public
> relations, for Mother Nature cannot be fooled" - Richard Feynman
>
> dave at taht.net <Dave Täht> CTO, TekLibre, LLC Tel: 1-831-435-0729
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/make-wifi-fast/attachments/20200515/4f663615/attachment-0001.html>