From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-x22e.google.com (mail-wm0-x22e.google.com [IPv6:2a00:1450:400c:c09::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 8DDEC3B2A4 for ; Tue, 17 Oct 2017 00:53:38 -0400 (EDT) Received: by mail-wm0-x22e.google.com with SMTP id u138so1184098wmu.5 for ; Mon, 16 Oct 2017 21:53:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=dHUljvrCEcZa5x8uHUkCnqYHhixE4v1+2dzR2H9TNks=; b=ancHEEryRnbRfKvWB2jIl1rAuiox5z1v5JYU+XCyhTvH+ZBSqUuORuhgXuN02sCBsS OKC+IGvLPeBMnWS/hvtWNb2tgjL7mdy33YkEI9A18NaRMrQ5SylG2PM0B2TmwBvfJ48X 03Idf3gXz4hvH3BTqEeCr0lp4XkZ43PTKkKvU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=dHUljvrCEcZa5x8uHUkCnqYHhixE4v1+2dzR2H9TNks=; b=L7z2HYKDsghoBg/l4FM0dqAha62i+v41K7eTnEogPvtKjw07HpctJNsHKeQCgYS7h6 P4RI54WK3h9aG/4mszoCwdm/9VtJ5vi7Yp0Qs52PmMUAAjDfJrX9PWhkqrsIqiM1MQ/Q JKAK1FQop3oPfFMAETUMPT35roTBDtE1eGuT7peC5jVgXNqLnt4DQGYKMyg0gu+Qypm/ LyVUMmjIX2gYta8/kIR1w5b+ooTnsf/Y4AkZTJYJouYemLshSCJCmfp2yfQVS5HErVAZ gtmc6wQpToJQf1wXQr5dRTc3nKHQ8PRM5ULXbVsB3F76ju5VmdTktYZXEb536m+1i+MX FDow== X-Gm-Message-State: AMCzsaVapbln29Qz3oz4WEj7J96P1d6cI+prLfLneG6xElqn+blAG6lx QDy81zfiwe4TybPF5J93fkQ24eYMVr/p6i+HmzQi+Q== X-Google-Smtp-Source: AOwi7QDLZBABq2xVQ5U+D1nCNsCfyfdVQXKtHN4llJ4G5Lqs9XTlsYov21SZKaKgQuxis5HlQhsZbAo9jaamc4TvDyQ= X-Received: by 10.80.190.131 with SMTP id b3mr15375975edk.243.1508216017457; Mon, 16 Oct 2017 21:53:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.80.204.220 with HTTP; Mon, 16 Oct 2017 21:53:36 -0700 (PDT) In-Reply-To: <17145C7B-4674-4253-AAC6-2628B8FD497B@superduper.net> References: <1507581711.45638427@apps.rackspace.com> <17145C7B-4674-4253-AAC6-2628B8FD497B@superduper.net> From: Bob McMahon Date: Mon, 16 Oct 2017 21:53:36 -0700 Message-ID: To: Simon Barber Cc: make-wifi-fast@lists.bufferbloat.net, Johannes Berg Content-Type: multipart/alternative; boundary="089e0824e2c8f73e37055bb6e70c" Subject: Re: [Make-wifi-fast] less latency, more filling... for wifi X-BeenThere: make-wifi-fast@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Oct 2017 04:53:38 -0000 --089e0824e2c8f73e37055bb6e70c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I'm confused. Are you referring to TCP's RTT or some other round trip? If something else, what? How is one way latency measured without clock synchronization and a common clock domain? Thanks, Bob On Mon, Oct 16, 2017 at 2:26 PM, Simon Barber wrote: > What I mean is for the tool to directly measure minimum round trip, and > then report one way delay above this separately in each direction. This c= an > be done without external time synchronization. > > Simon > > On Oct 9, 2017, at 2:44 PM, Simon Barber wrote: > > Very nice - I=E2=80=99m using iperf3.2 and always have to figure packets = per > second by combining packet size and bandwidth. This will be much easier. > Also direct reporting of one way latency variance above minimum round tri= p > would be very useful. > > Simon > > On Oct 9, 2017, at 2:04 PM, Bob McMahon wrote: > > Hi, > > Not sure if this is helpful but we've added end/end latency measurements > for UDP traffic in iperf 2.0.10 . > It does require the clocks to be synched. I use a spectracom tsync pci= e > card with either an oven controlled oscillator or a GPS disciplined one, > then use precision time protocol to distribute the clock over ip > multicast. For Linux, the traffic threads are set to realtime scheduling > to minimize latency adds per thread scheduling.. > > I'm also in the process of implementing a very simple isochronous option > where the iperf client (tx) accepts a frames per second commmand line val= ue > (e.g. 60) as well as a log normal distribution > for the > input to somewhat simulate variable bit rates. On the iperf receiver > considering implementing an underflow/overflow counter per the expected > frames per second. > > Latency does seem to be a significant metric. Also is power consumption. > > Comments welcome. > > Bob > > On Mon, Oct 9, 2017 at 1:41 PM, wrote: > >> It's worth setting a stretch latency goal that is in principle achievabl= e. >> >> >> I get the sense that the wireless group obsesses over maximum channel >> utilization rather than excellent latency. This is where it's important= to >> put latency as a primary goal, and utilization as the secondary goal, >> rather than vice versa. >> >> >> It's easy to get at this by observing that the minimum latency on the >> shared channel is achieved by round-robin scheduling of packets that are= of >> sufficient size that per packet overhead doesn't dominate. >> >> >> So only aggregate when there are few contenders for the channel, or the >> packets are quite small compared to the per-packet overhead. When there = are >> more contenders, still aggregate small packets, but only those that are >> actually waiting. But large packets shouldn't be aggregated. >> >> >> Multicast should be avoided by higher level protocols for the most part, >> and the latency of multicast should be a non-issue. In wireless, it's ki= nd >> of a dumb idea anyway, given that stations have widely varying propagati= on >> characteristics. Do just enough to support DHCP and so forth. >> >> >> It's so much fun for tha hardware designers to throw in stuff that only >> helps in marketing benchmarks (like getting a few percent on throughput = in >> lab conditions that never happen in the field) that it is tempting for O= S >> driver writers to use those features (like deep queues and offload >> processing bells and whistles). But the real issue to be solved is that >> turn-taking "bloat" that comes from too much attempt to aggregate, to >> handle the "sole transmitter to dedicated receiver case" etc. >> >> >> I use 10 GigE in my house. I don't use it because I want to do 10 Gig >> File Transfers all day and measure them. I use it because (properly >> managed) it gives me *low latency*. That low latency is what matters, no= t >> throughput. My average load, if spread out across 24 hours, could be >> handled by 802.11b for the entire house. >> >> >> We are soon going to have 802.11ax in the home. That's approximately 10 >> Gb/sec, but wireless. No TV streaming can fill it. It's not for continuo= us >> isochronous traffic at all. >> >> >> What it is for is *low latency*. So if the adapters and the drivers won'= t >> give me that low latency, what good is 10 Gb/sec at all. This is true fo= r >> 802.11ac, as well. >> >> >> We aren't building Dragsters fueled with nitro, to run down 1/4 mile of >> track but unable to steer. >> >> >> Instead, we want to be able to connect musical instruments in an >> electronic symphony, where timing is everything. >> >> >> >> >> On Monday, October 9, 2017 4:13pm, "Dave Taht" >> said: >> >> > There were five ideas I'd wanted to pursue at some point. I''m not >> > presently on linux-wireless, nor do I have time to pay attention right >> > now - but I'm enjoying that thread passively. >> > >> > To get those ideas "out there" again: >> > >> > * adding a fixed length fq'd queue for multicast. >> > >> > * Reducing retransmits at low rates >> > >> > See the recent paper: >> > >> > "Resolving Bufferbloat in TCP Communication over IEEE 802.11 n WLAN by >> > Reducing MAC Retransmission Limit at Low Data Rate" (I'd paste a link >> > but for some reason that doesn't work well) >> > >> > Even with their simple bi-modal model it worked pretty well. >> > >> > It also reduces contention with "bad" stations more automagically. >> > >> > * Less buffering at the driver. >> > >> > Presently (ath9k) there are two-three aggregates stacked up at the >> driver. >> > >> > With a good estimate for how long it will take to service one, forming >> > another within that deadline seems feasible, so you only need to have >> > one in the hardware itself. >> > >> > Simple example: you have data in the hardware projected to take a >> > minimum of 4ms to transmit. Don't form a new aggregate and submit it >> > to the hardware for 3.5ms. >> > >> > I know full well that a "good" estimate is hard, and things like >> > mu-mimo complicate things. Still, I'd like to get below 20ms of >> > latency within the driver, and this is one way to get there. >> > >> > * Reducing the size of a txop under contention >> > >> > if you have 5 stations getting blasted away at 5ms each, and one that >> > only wants 1ms worth of traffic, "soon", temporarily reducing the size >> > of the txop for everybody so you can service more stations faster, >> > seems useful. >> > >> > * Merging acs when sane to do so >> > >> > sane aggregation in general works better than prioritizing does, as >> > shown in ending the anomaly. >> > >> > -- >> > >> > Dave T=C3=A4ht >> > CEO, TekLibre, LLC >> > http://www.teklibre.com >> > Tel: 1-669-226-2619 <(669)%20226-2619> >> > _______________________________________________ >> > Make-wifi-fast mailing list >> > Make-wifi-fast@lists.bufferbloat.net >> > https://lists.bufferbloat.net/listinfo/make-wifi-fast >> >> _______________________________________________ >> Make-wifi-fast mailing list >> Make-wifi-fast@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/make-wifi-fast >> > > _______________________________________________ > Make-wifi-fast mailing list > Make-wifi-fast@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/make-wifi-fast > > > _______________________________________________ > Make-wifi-fast mailing list > Make-wifi-fast@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/make-wifi-fast > > > --089e0824e2c8f73e37055bb6e70c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I'm confused.=C2=A0 Are you referring to TCP's RTT= or some other round trip?=C2=A0 If something else, what?=C2=A0 =C2=A0How i= s one way latency measured without clock synchronization and a common clock= domain?

Thanks,
Bob

On Mon, Oct 16, 2017 at 2:26 PM, Simon= Barber <simon@superduper.net> wrote:
What I mean is for the t= ool to directly measure minimum round trip, and then report one way delay a= bove this separately in each direction. This can be done without external t= ime synchronization.
Simon

<= blockquote type=3D"cite">
On Oct 9, 2017, at 2:44 PM, Simon Barber <= simon@superduper.= net> wrote:

Very nice - I=E2=80=99m = using iperf3.2 and always have to figure packets per second by combining pa= cket size and bandwidth. This will be much easier. Also direct reporting of= one way latency variance above minimum round trip would be very useful.
Simon

On= Oct 9, 2017, at 2:04 PM, Bob McMahon <bob.mcmahon@broadcom.com> wrote:
<= br class=3D"m_2606894233111259723Apple-interchange-newline">
Hi,

Not sure if this is helpful but we've added end/end= latency measurements for UDP traffic in iperf 2.0.10. =C2=A0 It does requi= re the clocks to be synched.=C2=A0 I use a spectracom tsync pcie card with = either an oven controlled oscillator or a GPS disciplined one, then use pre= cision time protocol to distribute the clock over ip multicast.=C2=A0 For L= inux, the traffic threads are set to realtime scheduling to minimize latenc= y adds per thread scheduling..

I'm also in the proce= ss of implementing a very simple isochronous option where the iperf client = (tx) accepts a frames per second commmand line value (e.g. 60) as well as a= log normal distribution for the input to somewhat s= imulate variable bit rates.=C2=A0 On the iperf receiver considering impleme= nting an underflow/overflow counter per the expected frames per second.
=
Latency does seem to be a significant metric.=C2=A0 Also is power consu= mption.

Comments welcome.

Bob

On Mon, = Oct 9, 2017 at 1:41 PM, <dpreed@reed.com> wrote:
It's worth settin= g a stretch latency goal that is in principle achievable.

=C2=A0

I get the se= nse that the wireless group obsesses over maximum channel utilization rathe= r than excellent latency.=C2=A0 This is where it's important to put lat= ency as a primary goal, and utilization as the secondary goal, rather than = vice versa.

=C2=A0

It's easy to get at this by observing that the minimum= latency on the shared channel is achieved by round-robin scheduling of pac= kets that are of sufficient size that per packet overhead doesn't domin= ate.

=C2=A0

So only aggregate when there are few contenders for the channel, = or the packets are quite small compared to the per-packet overhead. When th= ere are more contenders, still aggregate small packets, but only those that= are actually waiting. But large packets shouldn't be aggregated.
=

=C2=A0

= Multicast should be avoided by higher level protocols for the most part, an= d the latency of multicast should be a non-issue. In wireless, it's kin= d of a dumb idea anyway, given that stations have widely varying propagatio= n characteristics. Do just enough to support DHCP and so forth.

=C2=A0

It'= ;s so much fun for tha hardware designers to throw in stuff that only helps= in marketing benchmarks (like getting a few percent on throughput in lab c= onditions that never happen in the field) that it is tempting for OS driver= writers to use those features (like deep queues and offload processing bel= ls and whistles). But the real issue to be solved is that turn-taking "= ;bloat" that comes from too much attempt to aggregate, to handle the &= quot;sole transmitter to dedicated receiver case" etc.

=C2=A0

I use 10= GigE in my house. I don't use it because I want to do 10 Gig File Tran= sfers all day and measure them. I use it because (properly managed) it give= s me *low latency*. That low latency is what matters, not throughput. My av= erage load, if spread out across 24 hours, could be handled by 802.11b for = the entire house.

=C2=A0

We are soon going to have 802.11ax in the home. That= 's approximately 10 Gb/sec, but wireless. No TV streaming can fill it. = It's not for continuous isochronous traffic at all.

=C2=A0

What it is f= or is *low latency*. So if the adapters and the drivers won't give me t= hat low latency, what good is 10 Gb/sec at all. This is true for 802.11ac, = as well.

=C2=A0

We aren't building Dragsters fueled with nitro, to run do= wn 1/4 mile of track but unable to steer.

=C2=A0

Instead, we want to be able = to connect musical instruments in an electronic symphony, where timing is e= verything.

=C2=A0



On Mond= ay, October 9, 2017 4:13pm, "Dave Taht" <dave.taht@gmail.com> said:
=
&g= t; There were five ideas I'd wanted to pursue at some point. I''= ;m not
> presently on linux-wireless, nor do I have time to pay atten= tion right
> now - but I'm enjoying that thread passively.
>= ;
> To get those ideas "out there" again:
>
>= * adding a fixed length fq'd queue for multicast.
>
> * R= educing retransmits at low rates
>
> See the recent paper:
= >
> "Resolving Bufferbloat in TCP Communication over IEEE 80= 2.11 n WLAN by
> Reducing MAC Retransmission Limit at Low Data Rate&q= uot; (I'd paste a link
> but for some reason that doesn't wor= k well)
>
> Even with their simple bi-modal model it worked pr= etty well.
>
> It also reduces contention with "bad"= stations more automagically.
>
> * Less buffering at the driv= er.
>
> Presently (ath9k) there are two-three aggregates stack= ed up at the driver.
>
> With a good estimate for how long it = will take to service one, forming
> another within that deadline seem= s feasible, so you only need to have
> one in the hardware itself.>
> Simple example: you have data in the hardware projected to t= ake a
> minimum of 4ms to transmit. Don't form a new aggregate an= d submit it
> to the hardware for 3.5ms.
>
> I know full= well that a "good" estimate is hard, and things like
> mu-= mimo complicate things. Still, I'd like to get below 20ms of
> la= tency within the driver, and this is one way to get there.
>
>= * Reducing the size of a txop under contention
>
> if you hav= e 5 stations getting blasted away at 5ms each, and one that
> only wa= nts 1ms worth of traffic, "soon", temporarily reducing the size> of the txop for everybody so you can service more stations faster,> seems useful.
>
> * Merging acs when sane to do so
&= gt;
> sane aggregation in general works better than prioritizing doe= s, as
> shown in ending the anomaly.
>
> --
>
= > Dave T=C3=A4ht
> CEO, TekLibre, LLC
> http://www.teklibre.com
> Tel:= = 1-669-226-2619
> ___________________________________________= ____
> Make-wifi-fast mailing list
> Make-wifi-fast@lists.buffer= bloat.net
> https://lists.bufferbloat.net/list= info/make-wifi-fast

_________________________________________= ______
Make-wifi-fast mailing list
M= ake-wifi-fast@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/mak= e-wifi-fast

_______________________________________________
Make-wifi-fast mail= ing list
Make-wifi-fast@lists.bufferbloat.net
https= ://lists.bufferbloat.net/listinfo/make-wifi-fast

_______________________________________________=
Make-wifi-fast mailing list
Make-wifi-fast@lists.bufferbloat.net=
https://lists.bufferbloat.net/listinfo/make-wifi-fast=


--089e0824e2c8f73e37055bb6e70c--