From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd2b.google.com (mail-io1-xd2b.google.com [IPv6:2607:f8b0:4864:20::d2b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 55F0F3B29D for ; Sat, 16 May 2020 01:02:24 -0400 (EDT) Received: by mail-io1-xd2b.google.com with SMTP id f4so4909291iov.11 for ; Fri, 15 May 2020 22:02:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=8cl7xwMz8heZUc7aVN2ZwiVRFYiXhCEf2DcCetY2Dyg=; b=fRRw6aHX4Rifz+UFWG/5rJGQ4MdC5DvTgQw09EdFXYhVzRSWMUGj85/923J+U0pVps BZWP4qdJOKrI2lC5eHJ5VHIzCx5+NiF6wS3dD8isnxH8q1N208jSvRSwMJxQyyPBaUKJ C1plG1AL77Gp2l2+R0Ps4Vr1MOgHQcpnOnjXgchGxlO7noCQUNgVqmBKyHi1sdY00Ij1 HZ1f0XVeRmdViiOMt+67wN0QPaEOAW2mdkMW61DPTh4jvVoJgtU8s1y9uhvb/JnHVDon VHebpQr/Hk11hNuW8BlJuY/hn683soPM6tq4ckQoqv6MjfRY+YUvxmxKpVxl1pwRkxF4 JIRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=8cl7xwMz8heZUc7aVN2ZwiVRFYiXhCEf2DcCetY2Dyg=; b=czmzYFxObj9W796i8huKX55BSMbmTkaQcAHljSwu//YckqSoU47X5HlbZel6Q5Fzy/ ZVqysJxIiRqKx1vu91WRxzEZQ4czwYGZuJ6tZa/fU26KDodHCCWUJvfNy4BXQr9qhEcS 6m6/6Pv/fpyOdyO32/zmqddVuaL7szE+Lo1taq94oMfY7MPrdvafjBUR/6mqpElDo98d YDgOCVpsZRHeE15GNPWKyro+sHWeBqjcEq/AbrwjrcXagTBpyIM9DM9JQjHgSs5gGX2H aFBlGy7JkA/+3zTjIzvBW5vy2RR+2EzZHI8xqxznudNm2pg0aIgx8cFVc8PSUSLdDQVx a1IA== X-Gm-Message-State: AOAM530ulk0cZcgJXuza0ghjPyvAOjSZqKxjwb+f+knyrcZZyMtzwynO 4FbqvXSePgRm744U2O48urjbDhxY6KZTajRedek= X-Google-Smtp-Source: ABdhPJzfZuBSD3s+h7spR2H2ghpaMc/dunuSNEeBiOVDQx+pgIwTEHF/YQ2+nlubWb10/2Pw6N+rPyLouhAE1BaArUg= X-Received: by 2002:a6b:1543:: with SMTP id 64mr6120302iov.123.1589605343474; Fri, 15 May 2020 22:02:23 -0700 (PDT) MIME-Version: 1.0 References: <56a03e99-3337-bf4a-4743-deb93abb9592@smallnetbuilder.com> <6d8fc82a-cd32-31ba-023c-4dcd7822bb1d@smallnetbuilder.com> In-Reply-To: From: Dave Taht Date: Fri, 15 May 2020 22:02:11 -0700 Message-ID: To: Bob McMahon Cc: Tim Higgins , Make-Wifi-fast Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [Make-wifi-fast] SmallNetBuilder article: Does OFDMA Really Work? X-BeenThere: make-wifi-fast@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 16 May 2020 05:02:24 -0000 On Fri, May 15, 2020 at 2:35 PM Bob McMahon wrot= e: > > I'm in alignment with Dave's and Toke's posts. I do disagree somewhat wit= h: > > >I'd say take latency measurements when the input rates are below the ser= vice rates. > > That is ridiculous. It requires an oracle. It requires a belief system > where users will never exceed your mysterious parameters. If you couldn't tell, I've had a long week. Here's to shared virtual beverage of choice, all round? > > What zero-queue or low-queue latency measurements provide is a top end or= best performance, even when monitoring the tail of that CDF. Things like= AR gaming are driving WiFi "ultra low latency" requirements where phase/sp= atial stream matter. How well an algorithm detects 1 to 2 spatial streams = is starting to matter. 2->1 stream is a relatively easy decision. Then ther= e is the 802.11ax AP scheduling vs EDCA which is a very difficult engineeri= ng problem but sorely needed. > > A major issue as a WiFi QA engineer is how to measure a multivariate syst= em in a meaningful (and automated) way. Easier said than done. (I find tryi= ng to present Mahalanobis distances doesn't work well especially when compa= red to a scalar or single number.) The first scalar relied upon too much i= s peak average throughput, particularly without concern for bloat. This was= a huge flaw by the industry as bloat was inserted everywhere by most every= one providing little to no benefit - actually a design flaw per energy, tra= nsistors, etc. Engineers attacking bloat has been a very good thing by my = judgment. > > Some divide peak average throughput by latency to "get network power" Bu= t then there is "bloat latency" vs. "service latency." Note: with iperf 2= .0.14 it's easy to see the difference by using socket read or write rate li= miting. If the link is read rate limited (with -b on the server) the bloat = is going to be exacerbated per a read congestion point. If it's write rate= limited, -b on the client, the queues shouldn't be in a standing state. > > And then of course, "real world" class of measurements are very hard. And= a chip is usually powered by a battery so energy per useful xfer bit matte= rs too. yes, most are. the ones I worry about most, aren't. Aside from that, agree totally with what you say, especially multivariate systems. Need an AI to interpret the rrul tests. > > So parameters can be seen as mysterious for sure. Figuring out how to dem= ystify can be the fun part ;) /me clinks glass over the virtual bar gnight! > > Bob > > > > On Fri, May 15, 2020 at 1:30 PM Dave Taht wrote: >> >> On Fri, May 15, 2020 at 12:50 PM Tim Higgins w= rote: >> > >> > Thanks for the additional insights, Bob. How do you measure TCP connec= ts? >> > >> > Does Dave or anyone else on the bufferbloat team want to comment on Bo= b's comment that latency testing under "heavy traffic" isn't ideal? >> >> I hit save before deciding to reply. >> >> > My impression is that the rtt_fair_var test I used in the article and = other RRUL-related Flent tests fully load the connection under test. Am I i= ncorrect? >> >> well, to whatever extent possible by other limits in the hardware. >> Under loads like these, other things - such as the rx path, or cpu, >> start to fail. I had one box that had a memory leak, overnight testing >> like this, showed it up. Another test - with ipv6 - ultimately showed >> serious ipv6 traffic was causing a performance sucking cpu trap. >> Another test showed IPv6 being seriously outcompeted by ipv4 because >> there was 4096 ipv4 flow offloads in the hardware, and only 64 for ipv6.= ... >> >> There are many other tests in the suite - testing a fully loaded >> station while other stations are moping along... stuff near and far >> away (ATF), >> >> >> > >> > =3D=3D=3D >> > On 5/15/2020 3:36 PM, Bob McMahon wrote: >> > >> > Latency testing under "heavy traffic" isn't ideal. >> >> Of course not. But in any real time control system, retaining control >> and degrading predictably under load, is a hard requirement >> in most other industries besides networking. Imagine if you only >> tested your car, at speeds no more than 55mph, on roads that were >> never slippery and with curves never exceeding 6 degrees. Then shipped >> it, without a performance governor, and rubber bands holding >> the steering wheel on that would break at 65mph, and with tires that >> only worked at those speeds on those kind of curves. >> >> To stick with the heavy traffic analogy, but in a slower case... I >> used to have a car that overheated in heavy stop and go traffic. >> Eventually, it caught on fire. (The full story is really funny, >> because I was naked at the time, but I'll save it for a posthumous >> biography) >> >> > If the input rate exceeds the service rate of any queue for any period= of time the queue fills up and latency hits a worst case per that queue de= pth. >> >> which is what we're all about managing well, and predictably, here at >> bufferbloat.net >> >> >I'd say take latency measurements when the input rates are below the se= rvice rates. >> >> That is ridiculous. It requires an oracle. It requires a belief system >> where users will never exceed your mysterious parameters. >> >> > The measurements when service rates are less than input rates are less= about latency and more about bloat. >> >> I have to note that latency measurements are certainly useful on less >> loaded networks. Getting an AP out of sleep state is a good one, >> another is how fast can you switch stations, under a minimal (say, >> voip mostly) load, in the presence of interference. >> >> > Also, a good paper is this one on trading bandwidth for ultra low late= ncy using phantom queues and ECN. >> >> I'm burned out on ecn today. on the high end I rather like cisco's AFD..= . >> >> https://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-se= ries-switches/white-paper-c11-738488.html >> >> > Another thing to consider is that network engineers tend to have a mio= ptic view of latency. The queueing or delay between the socket writes/read= s and network stack matters too. >> >> It certainly does! I'm always giving a long list of everything we've >> done to improve the linux stack from app to endpoint. >> >> Over on reddit recently (can't find the link) I talked about how bad >> the linux ethernet stack was, pre-bql. I don't think anyone in the >> industry >> really understood deeply, the effects of packet aggregation in the >> multistation case, for wifi. (I'm still unsure if anyone does!). Also >> endless retries starving out other stations is huge problem in wifi, >> and lte, and is going to become more of one on cable... >> >> We've worked on tons of things - like tcp_lowat, fq, and queuing in >> general - jeeze - >> https://conferences.sigcomm.org/sigcomm/2014/doc/slides/137.pdf >> See the slide on smashing latency everywhere in the stack. >> >> And I certainly, now that I can regularly get fiber down below 2ms, >> regard the overhead of opus (2.7ms at the higest sampling rate) a real >> problem, >> along with scheduling delay and jitter in the os in the jamophone >> project. It pays to bypass the OS, when you can. >> >> Latency is everywhere, and you have to tackle it, everywhere, but it >> helps to focus on what ever is costing you the most latency at a time, >> re: >> >> https://en.wikipedia.org/wiki/Gustafson%27s_law >> >> My biggest complaint nowadays about modern cpu architectures is that >> they can't context switch faster than a few thousand cycles. I've >> advocated that folk look over mill computer's design, which can do it in= 5. >> >> >Network engineers focus on packets or TCP RTTs and somewhat overlook a = user's true end to end experience. >> >> Heh. I don't. Despite all I say here (Because I viewed the network as >> the biggest problem 10 years ago), I have been doing voip and >> videoconferencing apps for over 25 years, and basic benchmarks like >> eye to eye/ear ear delay and jitter I have always hoped more used. >> >> > Avoiding bloat by slowing down the writes, e.g. ECN or different sche= duling, still contributes to end/end latency between the writes() and the r= eads() that too few test for and monitor. >> >> I agree that iperf had issues. I hope they are fixed now. >> >> > >> > Note: We're moving to trip times of writes to reads (or frames for vid= eo) for our testing. >> >> ear to ear or eye to eye delay measurements are GOOD. And a lot of >> that delay is still in the stack. One day, perhaps >> we can go back to scan lines and not complicated encodings. >> >> >We are also replacing/supplementing pings with TCP connects as other "l= atency related" measurements. TCP connects are more important than ping. >> >> I wish more folk measured dns lookup delay... >> >> Given the prevalance of ssl, I'd be measuring not just the 3whs, but >> that additional set of handshakes. >> >> We do have a bunch of http oriented tests in the flent suite, as well >> as for voip. At the time we were developing it, >> though, videoconferncing was in its infancy and difficult to model, so >> we tended towards using what flows we could get >> from real servers and services. I think we now have tools to model >> videoconferencing traffic much better today than >> we could, but until now, it wasn't much of a priority. >> >> It's also important to note that videoconferencing and gaming traffic >> put a very different load on the network - very sensitive to jitter, >> not so sensitive to loss. Both are VERY low bandwidth compared to tcp >> - gaming is 35kbit/sec for example, on 10 or 20ms intervals. >> >> > >> > Bob >> > >> > On Fri, May 15, 2020 at 8:20 AM Tim Higgins = wrote: >> >> >> >> Hi Bob, >> >> >> >> Thanks for your comments and feedback. Responses below: >> >> >> >> On 5/14/2020 5:42 PM, Bob McMahon wrote: >> >> >> >> Also, forgot to mention, for latency don't rely on average as most do= n't care about that. Maybe use the upper 3 stdev, i.e. the 99.97% point. = Our latency runs will repeat 20 seconds worth of packets and find that then= calculate CDFs of this point in the tail across hundreds of runs under dif= ferent conditions. One "slow packet" is all that it takes to screw up user = experience when it comes to latency. >> >> >> >> Thanks for the guidance. >> >> >> >> >> >> On Thu, May 14, 2020 at 2:38 PM Bob McMahon wrote: >> >>> >> >>> I haven't looked closely at OFDMA but these latency numbers seem way= too high for it to matter. Why is the latency so high? It suggests there= may be queueing delay (bloat) unrelated to media access. >> >>> >> >>> Also, one aspect is that OFDMA is replacing EDCA with AP scheduling = per trigger frame. EDCA kinda sucks per listen before talk which is about = 100 microseconds on average which has to be paid even when no energy detect= . This limits the transmits per second performance to 10K (1/0.0001.). Als= o remember that WiFi aggregates so transmissions have multiple packets and = long transmits will consume those 10K tx ops. One way to get around aggrega= tion is to use voice (VO) access class which many devices won't aggregate (= mileage will vary.). Then take a packets per second measurement with small = packets. This would give an idea on the frame scheduling being AP based vs= EDCA. >> >>> >> >>> Also, measuring ping time as a proxy for latency isn't ideal. Better= to measure trip times of the actual traffic. This requires clock sync to = a common reference. GPS atomic clocks are available but it does take some s= etup work. >> >>> >> >>> I haven't thought about RU optimizations and that testing so can't r= eally comment there. >> >>> >> >>> Also, I'd consider replacing the mechanical turn table with variable= phase shifters and set them in the MIMO (or H-Matrix) path. I use model 8= 421 from Aeroflex. Others make them too. >> >>> >> >> Thanks again for the suggestions. I agree latency is very high when I= remove the traffic bandwidth caps. I don't know why. One of the key questi= ons I've had since starting to mess with OFDMA is whether it helps under li= ght or heavy traffic load. All I do know is that things go to hell when you= load the channel. And RRUL test methods essentially break OFDMA. >> >> >> >> I agree using ping isn't ideal. But I'm approaching this as creating = a test that a consumer audience can understand. Ping is something consumers= care about and understand. The octoScope STApals are all ntp sync'd and l= atency measurements using iperf have been done by them. >> >> >> >> >> > >> > _______________________________________________ >> > Make-wifi-fast mailing list >> > Make-wifi-fast@lists.bufferbloat.net >> > https://lists.bufferbloat.net/listinfo/make-wifi-fast >> >> >> >> -- >> "For a successful technology, reality must take precedence over public >> relations, for Mother Nature cannot be fooled" - Richard Feynman >> >> dave@taht.net CTO, TekLibre, LLC Tel: 1-831-435-0729 --=20 "For a successful technology, reality must take precedence over public relations, for Mother Nature cannot be fooled" - Richard Feynman dave@taht.net CTO, TekLibre, LLC Tel: 1-831-435-0729