[Make-wifi-fast] [Starlink] RFC: Latency test case text and example report.

Tue Sep 13 14:32:01 EDT 2022

On Tue, Sep 13, 2022 at 9:57 AM Ben Greear <greearb at candelatech.com> wrote:
>
> On 9/13/22 9:12 AM, Dave Taht wrote:
> > On Tue, Sep 13, 2022 at 8:58 AM Ben Greear <greearb at candelatech.com> wrote:
> >>
> >> On 9/13/22 8:39 AM, Dave Taht wrote:
> >>> hey, ben, I'm curious if this test made it into TR398? Is it possible
> >>> to setup some of this or parts of TR398 to run over starlink?
> >>>
> >>> I'm also curious as to if any commercial ax APs were testing out
> >>> better than when you tested about this time last year.  I've just gone
> >>> through 9 months of pure hell getting openwrt's implementation of the
> >>> mt76 and ath10k to multiplex a lot better, and making some forward
> >>> progress again (
> >>> https://forum.openwrt.org/t/aql-and-the-ath10k-is-lovely/59002/830 )
> >>> and along the way ran into new problems with location scanning and
> >>> apple's airdrop....
> >>>
> >>> but I just got a batch of dismal results back from the ax210 and
> >>> mt79... tell me that there's an AP shipping from someone that scales a
> >>> bit better? Lie if you must...
> >>
> >> An mtk7915 based AP that is running recent owrt did better than others.
> >>
> >> http://www.candelatech.com/examples/TR-398v2-2022-06-05-08-28-57-6.2.6-latency-virt-sta-new-atf-c/
> >
> > I wanted to be happy, but... tcp...
> >
> > http://www.candelatech.com/examples/TR-398v2-2022-06-05-08-28-57-6.2.6-latency-virt-sta-new-atf-c/chart-31.png
> >
> > what's the chipset driving these tests nowadays?
>
> That test was done with MTK virtual stations doing the station load (and multi-gig Eth port
> sending traffic towards the DUT in download direction).

Openwrt driver or factory?

The last major patches for openwrt mt76 wifi landed aug 4, I think.
There are a few more under test now that the OS is stable.

> My assumption is that much of the TCP latency is very likely caused on the
> traffic generator itself, so that is why we measure udp latency for pass/fail
> metrics.

I fear a great deal of it is real, on the path, in the DUT. However
there is a lot in the local stack too.

Here's some things to try. TCP small queues stops being effective (at
this rate) at oh, 8-12 flows,
and they start accruing in the stack and look like an RTT inflation.
A big help is to set TCP_NONSENT_LOWAT to a low value (16k).

sch_fq is actually worse than fq_codel on the driving host as it too
accrues packets.

Trying out tcp reno, and BBR on this workload might show a difference.
I wish LEDBAT++ was available for linux...

... going theoreticall ...

There was some really great work on fractional windows that went into
google's swift congestion control, this is an earlier paper on it:

https://research.google/pubs/pub49448/

and a couple really great papers from google and others last week
from: https://conferences.sigcomm.org/sigcomm/2022/program.html

>
> It would take some special logic, like sniffing eth port and air at same time,
> and matching packets by looking at the packet content closely to really understand DUT TCP latency.
> I'm not sure that is worth the effort.

Heh. I of course, think it is, as TCP is the dominant protocol on the
internet... anyway,
to get a baseline comparison between tcp behaviors, you could just do
a pure ethernet test, set it
to what bandwidth you are getting out of this test via cake, and
measure the tcp rtts that way. It would be nice to know what the test
does without wifi in the way.

>
> But, assuming we can properly measure slow-speed UDP latency through DUT, do you still
> think that it is possible that DUT is causing significantly different latency to TCP
> packets?

Absolutely. It's the return path that's mostly at fault - every two
tcp packets needs an ack, so
even if you have working mu-mimo for 4 streams, that's 4 txops
(minimum) that the clients are going to respond on.

std Packet caps of this 32 station tcp test would be useful, and
aircaps would show how effeciently the clients are responding. A lot
of stations burn a whole txop on a single ack, then get the rest on
another....

>
> Thanks,

No, thank you, for sharing. Can you point at some commercial AP we
could test that does
better than this on the tcp test?

> Ben
>
>
> >
> >> The test was at least tentatively accepted into tr398v3, but I don't think anyone other than ourselves has implemented
> >> or tested it.  I think the pass/fail will need to be adjusted to make it easier to pass.  Some APs were showing
> >> multiple seconds of latency, so maybe a few hundred MS is really OK.
> >>
> >> The test should be able to run over WAN if desired, though it would take a bit
> >> of extra setup to place an upstream LANforge endpoint on a cloud VM.
> >>
> >> If someone at spacex wants to run this test, please contact me off list and we can help
> >> make it happen.
> >>
> >> Thanks,
> >> Ben
> >>
> >>>
> >>> On Sun, Sep 26, 2021 at 2:59 PM Ben Greear <greearb at candelatech.com> wrote:
> >>>>
> >>>> I have been working on a latency test that I hope can be included in the TR398 issue 3
> >>>> document.  It is based somewhat on Toke's paper on buffer bloat and latency testing,
> >>>> with a notable change that I'm doing this on 32 stations in part of the test.
> >>>>
> >>>> I implemented this test case, and an example run against an enterprise grade AX AP
> >>>> is here.  There could still be bugs in my implementation, but I think it is at least
> >>>> close to correct:
> >>>>
> >>>> http://www.candelatech.com/examples/tr398v3-latency-report.pdf
> >>>>
> >>>> TLDR:  Runs OK with single station, but sees 1+second one-way latency with 32 stations and high load, and UDP often
> >>>>      is not able to see any throughput at all, I guess due to too many packets being lost
> >>>>      or something.  I hope to run against some cutting-edge OpenWRT APs soon.
> >>>>
> >>>> One note on TCP Latency:  This is time to transmit a 64k chunk of data over TCP, not a single
> >>>> frame.
> >>>>
> >>>> My testbed used 32 Intel ax210 radios as stations in this test.
> >>>>
> >>>> I am interested in feedback from this list if anyone has opinions.
> >>>>
> >>>> Here is text of the test case:
> >>>>
> >>>> The Latency test intends to verify latency under low, high, and maximum AP traffic load, with
> >>>> 1 and 32 stations. Traffic load is 4 bi-directional TCP streams for each station, plus a
> >>>> low speed UDP connection to probe latency.
> >>>>
> >>>> Test Procedure
> >>>>
> >>>> DUT should be configured for 20Mhz on 2.4Ghz and 80Mhz on 5Ghz and stations should use
> >>>> two spatial streams.
> >>>>
> >>>> 1: For each combination of:  2.4Ghz N, 5Ghz AC, 2.4Ghz AX, 5Ghz AX:
> >>>>
> >>>> 2: Configure attenuators to emulate 2-meter distance between stations and AP.
> >>>>
> >>>> 3: Create 32 stations and allow one to associate with the DUT.  The other 31 are admin-down.
> >>>>
> >>>> 4: Create AP to Station (download) TCP stream, and run for 120 seconds, recoard
> >>>>       throughput as 'maximum_load'.  Stop this connection.
> >>>>
> >>>> 5: Calculate offered_load as 1% of maximum_load.
> >>>>
> >>>> 6: Create 4 TCP streams on each active station, each configured for Upload and Download rate of
> >>>>       offered_load / (4 * active_station_count * 2).
> >>>>
> >>>> 6: Create 1 UDP stream on each active station, configured for 56kbps traffic Upload and 56kbps traffic Download.
> >>>>
> >>>> 7: Start all TCP and UDP connections.  Wait 30 seconds to let traffic settle.
> >>>>
> >>>> 8: Every 10 seconds for 120 seconds, record one-way download latency over the last 10 seconds for each UDP connection.  Depending on test
> >>>>       equipment features, this may mean you need to start/stop the UDP every 10 seconds or clear the UDP connection
> >>>>       counters.
> >>>>
> >>>> 9: Calculate offered_load as 70% of maximum_load, and repeat steps 6 - 9 inclusive.
> >>>>
> >>>> 10: Calculate offered_load as 125% of maximum_load, and repeat steps 6 - 9 inclusive.
> >>>>
> >>>> 11: Allow the other 31 stations to associate, and repeat steps 5 - 11 inclusive with all 32 stations active.
> >>>>
> >>>>
> >>>> Pass/Fail Criteria
> >>>>
> >>>> 1: For each test configuration running at 1% of maximum load:  Average of all UDP latency samples must be less than 10ms.
> >>>> 2: For each test configuration running at 1% of maximum load:  Maximum of all UDP latency samples must be less than 20ms.
> >>>> 3: For each test configuration running at 70% of maximum load:  Average of all UDP latency samples must be less than 20ms.
> >>>> 4: For each test configuration running at 70% of maximum load:  Maximum of all UDP latency samples must be less than 40ms.
> >>>> 5: For each test configuration running at 125% of maximum load:  Average of all UDP latency samples must be less than 50ms.
> >>>> 6: For each test configuration running at 125% of maximum load:  Maximum of all UDP latency samples must be less than 100ms.
> >>>> 7: For each test configuration: Each UDP connection upload throughput must be at least 1/2 of requested UDP speed for final 10-second test interval.
> >>>> 8: For each test configuration: Each UDP connection download throughput must be at least 1/2 of requested UDP speed for final 10-second test interval.
> >>>>
> >>>>
> >>>> --
> >>>> Ben Greear <greearb at candelatech.com>
> >>>> Candela Technologies Inc  http://www.candelatech.com
> >>>> _______________________________________________
> >>>> Starlink mailing list
> >>>> Starlink at lists.bufferbloat.net
> >>>> https://lists.bufferbloat.net/listinfo/starlink
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Ben Greear <greearb at candelatech.com>
> >> Candela Technologies Inc  http://www.candelatech.com
> >>
> >
> >
>
>
> --
> Ben Greear <greearb at candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>

-- 
FQ World Domination pending: https://blog.cerowrt.org/post/state_of_fq_codel/
Dave Täht CEO, TekLibre, LLC