I'm in alignment with Dave's and Toke's posts. I do disagree somewhat with: >I'd say take latency measurements when the input rates are below the service rates. That is ridiculous. It requires an oracle. It requires a belief system where users will never exceed your mysterious parameters. What zero-queue or low-queue latency measurements provide is a top end or best performance, even when monitoring the tail of that CDF. Things like AR gaming are driving WiFi "ultra low latency" requirements where phase/spatial stream matter. How well an algorithm detects 1 to 2 spatial streams is starting to matter. 2->1 stream is a relatively easy decision. Then there is the 802.11ax AP scheduling vs EDCA which is a very difficult engineering problem but sorely needed. A major issue as a WiFi QA engineer is how to measure a multivariate system in a meaningful (and automated) way. Easier said than done. (I find trying to present Mahalanobis distances doesn't work well especially when compared to a scalar or single number.) The first scalar relied upon too much is peak average throughput, particularly without concern for bloat. This was a huge flaw by the industry as bloat was inserted everywhere by most everyone providing little to no benefit - actually a design flaw per energy, transistors, etc. Engineers attacking bloat has been a very good thing by my judgment. Some divide peak average throughput by latency to "get network power" But then there is "bloat latency" vs. "service latency." Note: with iperf 2.0.14 it's easy to see the difference by using socket read or write rate limiting. If the link is read rate limited (with -b on the server) the bloat is going to be exacerbated per a read congestion point. If it's write rate limited, -b on the client, the queues shouldn't be in a standing state. And then of course, "real world" class of measurements are very hard. And a chip is usually powered by a battery so energy per useful xfer bit matters too. So parameters can be seen as mysterious for sure. Figuring out how to demystify can be the fun part ;) Bob On Fri, May 15, 2020 at 1:30 PM Dave Taht wrote: > On Fri, May 15, 2020 at 12:50 PM Tim Higgins > wrote: > > > > Thanks for the additional insights, Bob. How do you measure TCP connects? > > > > Does Dave or anyone else on the bufferbloat team want to comment on > Bob's comment that latency testing under "heavy traffic" isn't ideal? > > I hit save before deciding to reply. > > > My impression is that the rtt_fair_var test I used in the article and > other RRUL-related Flent tests fully load the connection under test. Am I > incorrect? > > well, to whatever extent possible by other limits in the hardware. > Under loads like these, other things - such as the rx path, or cpu, > start to fail. I had one box that had a memory leak, overnight testing > like this, showed it up. Another test - with ipv6 - ultimately showed > serious ipv6 traffic was causing a performance sucking cpu trap. > Another test showed IPv6 being seriously outcompeted by ipv4 because > there was 4096 ipv4 flow offloads in the hardware, and only 64 for ipv6.... > > There are many other tests in the suite - testing a fully loaded > station while other stations are moping along... stuff near and far > away (ATF), > > > > > > === > > On 5/15/2020 3:36 PM, Bob McMahon wrote: > > > > Latency testing under "heavy traffic" isn't ideal. > > Of course not. But in any real time control system, retaining control > and degrading predictably under load, is a hard requirement > in most other industries besides networking. Imagine if you only > tested your car, at speeds no more than 55mph, on roads that were > never slippery and with curves never exceeding 6 degrees. Then shipped > it, without a performance governor, and rubber bands holding > the steering wheel on that would break at 65mph, and with tires that > only worked at those speeds on those kind of curves. > > To stick with the heavy traffic analogy, but in a slower case... I > used to have a car that overheated in heavy stop and go traffic. > Eventually, it caught on fire. (The full story is really funny, > because I was naked at the time, but I'll save it for a posthumous > biography) > > > If the input rate exceeds the service rate of any queue for any period > of time the queue fills up and latency hits a worst case per that queue > depth. > > which is what we're all about managing well, and predictably, here at > bufferbloat.net > > >I'd say take latency measurements when the input rates are below the > service rates. > > That is ridiculous. It requires an oracle. It requires a belief system > where users will never exceed your mysterious parameters. > > > The measurements when service rates are less than input rates are less > about latency and more about bloat. > > I have to note that latency measurements are certainly useful on less > loaded networks. Getting an AP out of sleep state is a good one, > another is how fast can you switch stations, under a minimal (say, > voip mostly) load, in the presence of interference. > > > Also, a good paper is this one on trading bandwidth for ultra low > latency using phantom queues and ECN. > > I'm burned out on ecn today. on the high end I rather like cisco's AFD... > > > https://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/white-paper-c11-738488.html > > > Another thing to consider is that network engineers tend to have a > mioptic view of latency. The queueing or delay between the socket > writes/reads and network stack matters too. > > It certainly does! I'm always giving a long list of everything we've > done to improve the linux stack from app to endpoint. > > Over on reddit recently (can't find the link) I talked about how bad > the linux ethernet stack was, pre-bql. I don't think anyone in the > industry > really understood deeply, the effects of packet aggregation in the > multistation case, for wifi. (I'm still unsure if anyone does!). Also > endless retries starving out other stations is huge problem in wifi, > and lte, and is going to become more of one on cable... > > We've worked on tons of things - like tcp_lowat, fq, and queuing in > general - jeeze - > https://conferences.sigcomm.org/sigcomm/2014/doc/slides/137.pdf > See the slide on smashing latency everywhere in the stack. > > And I certainly, now that I can regularly get fiber down below 2ms, > regard the overhead of opus (2.7ms at the higest sampling rate) a real > problem, > along with scheduling delay and jitter in the os in the jamophone > project. It pays to bypass the OS, when you can. > > Latency is everywhere, and you have to tackle it, everywhere, but it > helps to focus on what ever is costing you the most latency at a time, > re: > > https://en.wikipedia.org/wiki/Gustafson%27s_law > > My biggest complaint nowadays about modern cpu architectures is that > they can't context switch faster than a few thousand cycles. I've > advocated that folk look over mill computer's design, which can do it in 5. > > >Network engineers focus on packets or TCP RTTs and somewhat overlook a > user's true end to end experience. > > Heh. I don't. Despite all I say here (Because I viewed the network as > the biggest problem 10 years ago), I have been doing voip and > videoconferencing apps for over 25 years, and basic benchmarks like > eye to eye/ear ear delay and jitter I have always hoped more used. > > > Avoiding bloat by slowing down the writes, e.g. ECN or different > scheduling, still contributes to end/end latency between the writes() and > the reads() that too few test for and monitor. > > I agree that iperf had issues. I hope they are fixed now. > > > > > Note: We're moving to trip times of writes to reads (or frames for > video) for our testing. > > ear to ear or eye to eye delay measurements are GOOD. And a lot of > that delay is still in the stack. One day, perhaps > we can go back to scan lines and not complicated encodings. > > >We are also replacing/supplementing pings with TCP connects as other > "latency related" measurements. TCP connects are more important than ping. > > I wish more folk measured dns lookup delay... > > Given the prevalance of ssl, I'd be measuring not just the 3whs, but > that additional set of handshakes. > > We do have a bunch of http oriented tests in the flent suite, as well > as for voip. At the time we were developing it, > though, videoconferncing was in its infancy and difficult to model, so > we tended towards using what flows we could get > from real servers and services. I think we now have tools to model > videoconferencing traffic much better today than > we could, but until now, it wasn't much of a priority. > > It's also important to note that videoconferencing and gaming traffic > put a very different load on the network - very sensitive to jitter, > not so sensitive to loss. Both are VERY low bandwidth compared to tcp > - gaming is 35kbit/sec for example, on 10 or 20ms intervals. > > > > > Bob > > > > On Fri, May 15, 2020 at 8:20 AM Tim Higgins > wrote: > >> > >> Hi Bob, > >> > >> Thanks for your comments and feedback. Responses below: > >> > >> On 5/14/2020 5:42 PM, Bob McMahon wrote: > >> > >> Also, forgot to mention, for latency don't rely on average as most > don't care about that. Maybe use the upper 3 stdev, i.e. the 99.97% > point. Our latency runs will repeat 20 seconds worth of packets and find > that then calculate CDFs of this point in the tail across hundreds of runs > under different conditions. One "slow packet" is all that it takes to screw > up user experience when it comes to latency. > >> > >> Thanks for the guidance. > >> > >> > >> On Thu, May 14, 2020 at 2:38 PM Bob McMahon > wrote: > >>> > >>> I haven't looked closely at OFDMA but these latency numbers seem way > too high for it to matter. Why is the latency so high? It suggests there > may be queueing delay (bloat) unrelated to media access. > >>> > >>> Also, one aspect is that OFDMA is replacing EDCA with AP scheduling > per trigger frame. EDCA kinda sucks per listen before talk which is about > 100 microseconds on average which has to be paid even when no energy > detect. This limits the transmits per second performance to 10K > (1/0.0001.). Also remember that WiFi aggregates so transmissions have > multiple packets and long transmits will consume those 10K tx ops. One way > to get around aggregation is to use voice (VO) access class which many > devices won't aggregate (mileage will vary.). Then take a packets per > second measurement with small packets. This would give an idea on the > frame scheduling being AP based vs EDCA. > >>> > >>> Also, measuring ping time as a proxy for latency isn't ideal. Better > to measure trip times of the actual traffic. This requires clock sync to a > common reference. GPS atomic clocks are available but it does take some > setup work. > >>> > >>> I haven't thought about RU optimizations and that testing so can't > really comment there. > >>> > >>> Also, I'd consider replacing the mechanical turn table with variable > phase shifters and set them in the MIMO (or H-Matrix) path. I use model > 8421 from Aeroflex. Others make them too. > >>> > >> Thanks again for the suggestions. I agree latency is very high when I > remove the traffic bandwidth caps. I don't know why. One of the key > questions I've had since starting to mess with OFDMA is whether it helps > under light or heavy traffic load. All I do know is that things go to hell > when you load the channel. And RRUL test methods essentially break OFDMA. > >> > >> I agree using ping isn't ideal. But I'm approaching this as creating a > test that a consumer audience can understand. Ping is something consumers > care about and understand. The octoScope STApals are all ntp sync'd and > latency measurements using iperf have been done by them. > >> > >> > > > > _______________________________________________ > > Make-wifi-fast mailing list > > Make-wifi-fast@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/make-wifi-fast > > > > -- > "For a successful technology, reality must take precedence over public > relations, for Mother Nature cannot be fooled" - Richard Feynman > > dave@taht.net CTO, TekLibre, LLC Tel: 1-831-435-0729 >