From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-x234.google.com (mail-wm0-x234.google.com [IPv6:2a00:1450:400c:c09::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id A3FE93B29E for ; Wed, 11 Oct 2017 16:03:36 -0400 (EDT) Received: by mail-wm0-x234.google.com with SMTP id m72so21308039wmc.0 for ; Wed, 11 Oct 2017 13:03:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=TFelzOrQNizol1hP9SKuwbFPyOWs4MC7+mQKj+TITa0=; b=Gg4wqkrZCSxMmNsIQvFFMQkeEEfRdhCYAdsK0s1tAIy/agsjzsbBFEkYrc6Lu1Jf1B Ns9zt/Xp7Qk/kyc7XwPOMcfottPtJ7e25IKyGkUdxIScGS+oRzJ+SafiIEO62ZyzMYXt vZofDzi9QJO2i6rhWLWgYoFuC2MN9bT3zG6AA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=TFelzOrQNizol1hP9SKuwbFPyOWs4MC7+mQKj+TITa0=; b=i1qjVE5RFKxampWJKLfFI0dQh3iy4PTUiN4aSjehCA/s8Og1ET+UCXgOKoCjeCO4sl kGOB8bPfcc6SW6SVzGQeATH+1l+4x5I5wru7UaXAc51ZQQh2+9xYtuGDavm0JG3gQuDk i95H6CD2gcqTrVLL/NVFjXdu4Gk2AHXI6i+L3y5QYErjkKwazy6e0gnijdoxwGOkYxwW 5a9FGq4NMWJNOz10bb1Y+RbiNNxfJ/J5cuGz3zNxNauoYJ5bNp1VFxE6Axg+/J/y1i0K qY0HpmiabBrx9/+l0CHQ7PAwr5+lLN7wEFa8q5+1hN2dgt1G9fXdWfWV11ZbTSKzkBQI c4eg== X-Gm-Message-State: AMCzsaXJah5PCI1hu1slaPp+ajj5njcga0urwIrfwVlWS3q0dz6pZsO5 J4grjyZGdnxl9CIjZ04kzCgokc6K6PS3/zPl5SBcNIqe X-Google-Smtp-Source: AOwi7QBxmnZT8zbX0+oiGibYatAAxi8LudpyrVgxbyzQmklx+ikD8iUr8aVkQ6x6dfJWjv6pyZu68fXiAICRyIoHDo0= X-Received: by 10.80.166.209 with SMTP id f17mr170380edc.278.1507752215375; Wed, 11 Oct 2017 13:03:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.80.204.220 with HTTP; Wed, 11 Oct 2017 13:03:34 -0700 (PDT) In-Reply-To: References: <1507581711.45638427@apps.rackspace.com> From: Bob McMahon Date: Wed, 11 Oct 2017 13:03:34 -0700 Message-ID: To: Simon Barber Cc: David Reed , make-wifi-fast@lists.bufferbloat.net, Johannes Berg Content-Type: multipart/alternative; boundary="94eb2c198f2a356f7a055b4aeb40" Subject: Re: [Make-wifi-fast] less latency, more filling... for wifi X-BeenThere: make-wifi-fast@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Oct 2017 20:03:36 -0000 --94eb2c198f2a356f7a055b4aeb40 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable FYI, we're considering adding support for "--udp-triggers" in iperf 2.0.10+. Setting this option will cause a "magic number" to be placed in the UDP payload such that logic moving bytes through the system can be triggered to append their own timestamps into the udp payload, i.e. as the payload moves through each subsystem. This can help one can analyze the latency path of a single packet, as an example. Note: the standard iperf microsecond timestamps come from the application level (on tx) and from SO_TIMESTAMPs on receive (assuming SO_TIMESTAMPs is supported, otherwise its a syscall() after the socket receive) Being able to instrument each logic paths contribution to a single packet's latency can be helpful, at least for driver/firmware/ucode engineers. On the server side, we'll probably add a --histogram option so the latency distributions can be displayed (per each -i interval time) and higher level scripts can produce PDFs, CDFs and CCDFs for latencies. Let me know if generalizing this support in iperf is useful. Bob On Mon, Oct 9, 2017 at 3:02 PM, Bob McMahon wrote: > Not sure how to determine when one way latency is above round trip. > Iperf traffic for latency uses UDP where nothing is coming back. For TCP= , > the iperf client will report a sampled RTT per the network stack (on > operating systems that support this.) > > One idea - have two traffic streams, one TCP and one UDP, and use higher > level script (e.g. via python > ) to > poll data from each and perform the compare? Though, not sure if this > would give you what you're looking for. > > Bob > > On Mon, Oct 9, 2017 at 2:44 PM, Simon Barber wrote= : > >> Very nice - I=E2=80=99m using iperf3.2 and always have to figure packets= per >> second by combining packet size and bandwidth. This will be much easier. >> Also direct reporting of one way latency variance above minimum round tr= ip >> would be very useful. >> >> Simon >> >> On Oct 9, 2017, at 2:04 PM, Bob McMahon wrote= : >> >> Hi, >> >> Not sure if this is helpful but we've added end/end latency measurements >> for UDP traffic in iperf 2.0.10 >> . It does require the clocks >> to be synched. I use a spectracom tsync pcie card with either an oven >> controlled oscillator or a GPS disciplined one, then use precision time >> protocol to distribute the clock over ip multicast. For Linux, the traf= fic >> threads are set to realtime scheduling to minimize latency adds per thre= ad >> scheduling.. >> >> I'm also in the process of implementing a very simple isochronous option >> where the iperf client (tx) accepts a frames per second commmand line va= lue >> (e.g. 60) as well as a log normal distribution >> for >> the input to somewhat simulate variable bit rates. On the iperf receive= r >> considering implementing an underflow/overflow counter per the expected >> frames per second. >> >> Latency does seem to be a significant metric. Also is power consumption= . >> >> Comments welcome. >> >> Bob >> >> On Mon, Oct 9, 2017 at 1:41 PM, wrote: >> >>> It's worth setting a stretch latency goal that is in principle >>> achievable. >>> >>> >>> I get the sense that the wireless group obsesses over maximum channel >>> utilization rather than excellent latency. This is where it's importan= t to >>> put latency as a primary goal, and utilization as the secondary goal, >>> rather than vice versa. >>> >>> >>> It's easy to get at this by observing that the minimum latency on the >>> shared channel is achieved by round-robin scheduling of packets that ar= e of >>> sufficient size that per packet overhead doesn't dominate. >>> >>> >>> So only aggregate when there are few contenders for the channel, or the >>> packets are quite small compared to the per-packet overhead. When there= are >>> more contenders, still aggregate small packets, but only those that are >>> actually waiting. But large packets shouldn't be aggregated. >>> >>> >>> Multicast should be avoided by higher level protocols for the most part= , >>> and the latency of multicast should be a non-issue. In wireless, it's k= ind >>> of a dumb idea anyway, given that stations have widely varying propagat= ion >>> characteristics. Do just enough to support DHCP and so forth. >>> >>> >>> It's so much fun for tha hardware designers to throw in stuff that only >>> helps in marketing benchmarks (like getting a few percent on throughput= in >>> lab conditions that never happen in the field) that it is tempting for = OS >>> driver writers to use those features (like deep queues and offload >>> processing bells and whistles). But the real issue to be solved is that >>> turn-taking "bloat" that comes from too much attempt to aggregate, to >>> handle the "sole transmitter to dedicated receiver case" etc. >>> >>> >>> I use 10 GigE in my house. I don't use it because I want to do 10 Gig >>> File Transfers all day and measure them. I use it because (properly >>> managed) it gives me *low latency*. That low latency is what matters, n= ot >>> throughput. My average load, if spread out across 24 hours, could be >>> handled by 802.11b for the entire house. >>> >>> >>> We are soon going to have 802.11ax in the home. That's approximately 10 >>> Gb/sec, but wireless. No TV streaming can fill it. It's not for continu= ous >>> isochronous traffic at all. >>> >>> >>> What it is for is *low latency*. So if the adapters and the drivers >>> won't give me that low latency, what good is 10 Gb/sec at all. This is = true >>> for 802.11ac, as well. >>> >>> >>> We aren't building Dragsters fueled with nitro, to run down 1/4 mile of >>> track but unable to steer. >>> >>> >>> Instead, we want to be able to connect musical instruments in an >>> electronic symphony, where timing is everything. >>> >>> >>> >>> >>> On Monday, October 9, 2017 4:13pm, "Dave Taht" >>> said: >>> >>> > There were five ideas I'd wanted to pursue at some point. I''m not >>> > presently on linux-wireless, nor do I have time to pay attention righ= t >>> > now - but I'm enjoying that thread passively. >>> > >>> > To get those ideas "out there" again: >>> > >>> > * adding a fixed length fq'd queue for multicast. >>> > >>> > * Reducing retransmits at low rates >>> > >>> > See the recent paper: >>> > >>> > "Resolving Bufferbloat in TCP Communication over IEEE 802.11 n WLAN b= y >>> > Reducing MAC Retransmission Limit at Low Data Rate" (I'd paste a link >>> > but for some reason that doesn't work well) >>> > >>> > Even with their simple bi-modal model it worked pretty well. >>> > >>> > It also reduces contention with "bad" stations more automagically. >>> > >>> > * Less buffering at the driver. >>> > >>> > Presently (ath9k) there are two-three aggregates stacked up at the >>> driver. >>> > >>> > With a good estimate for how long it will take to service one, formin= g >>> > another within that deadline seems feasible, so you only need to have >>> > one in the hardware itself. >>> > >>> > Simple example: you have data in the hardware projected to take a >>> > minimum of 4ms to transmit. Don't form a new aggregate and submit it >>> > to the hardware for 3.5ms. >>> > >>> > I know full well that a "good" estimate is hard, and things like >>> > mu-mimo complicate things. Still, I'd like to get below 20ms of >>> > latency within the driver, and this is one way to get there. >>> > >>> > * Reducing the size of a txop under contention >>> > >>> > if you have 5 stations getting blasted away at 5ms each, and one that >>> > only wants 1ms worth of traffic, "soon", temporarily reducing the siz= e >>> > of the txop for everybody so you can service more stations faster, >>> > seems useful. >>> > >>> > * Merging acs when sane to do so >>> > >>> > sane aggregation in general works better than prioritizing does, as >>> > shown in ending the anomaly. >>> > >>> > -- >>> > >>> > Dave T=C3=A4ht >>> > CEO, TekLibre, LLC >>> > http://www.teklibre.com >>> > Tel: 1-669-226-2619 <(669)%20226-2619> >>> > _______________________________________________ >>> > Make-wifi-fast mailing list >>> > Make-wifi-fast@lists.bufferbloat.net >>> > https://lists.bufferbloat.net/listinfo/make-wifi-fast >>> >>> _______________________________________________ >>> Make-wifi-fast mailing list >>> Make-wifi-fast@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/make-wifi-fast >>> >> >> _______________________________________________ >> Make-wifi-fast mailing list >> Make-wifi-fast@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/make-wifi-fast >> >> >> > --94eb2c198f2a356f7a055b4aeb40 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
FYI, we're considering adding support for "--udp-= triggers" in iperf 2.0.10+. =C2=A0 Setting this option will cause a &q= uot;magic number" to be placed in the UDP payload such that logic movi= ng bytes through the system can be triggered to append their own timestamps= into the udp payload, i.e. as the payload moves through each subsystem.=C2= =A0 This can help one can analyze the latency path of a single packet, as a= n example. =C2=A0 Note: the standard iperf microsecond timestamps come from= the application level (on tx) and from SO_TIMESTAMPs on receive (assuming = SO_TIMESTAMPs is supported, otherwise its a syscall() after the socket rece= ive) =C2=A0 Being able to instrument each logic paths contribution to a sin= gle packet's latency can be helpful, at least for driver/firmware/ucode= engineers.

On the server side, we'll probably add a --histogram= option so the latency distributions can be displayed (per each -i interval= time) and higher level scripts can produce PDFs, CDFs and CCDFs for latenc= ies.

Let me know if generalizing this support in iperf is useful. = =C2=A0 =C2=A0

Bob
=
On Mon, Oct 9, 2017 at 3:02 PM, Bob McMahon = <bob.mcmahon@broadcom.com> wrote:
Not sure how to determine when one way late= ncy is above round trip. =C2=A0 Iperf traffic for latency uses UDP where no= thing is coming back.=C2=A0 For TCP, the iperf client will report a sampled= RTT per the network stack (on operating systems that support this.)=C2=A0<= br>
One idea - have two traffic streams, one TCP and one UDP, and use hi= gher level script (e.g. via python) to poll data= from each and perform the compare? =C2=A0 =C2=A0Though, not sure if this w= ould give you what you're looking for.

Bob

On Mon, Oct 9, 2017 at 2:44 PM, Simon Barber <simon@superd= uper.net> wrote:
Very nice - I=E2=80=99m using iperf3.2 and alwa= ys have to figure packets per second by combining packet size and bandwidth= . This will be much easier. Also direct reporting of one way latency varian= ce above minimum round trip would be very useful.

Simon

On Oct 9, 2017, at 2:04 PM, Bob McMahon <bob.mcmahon@broadc= om.com> wrote:

Hi,

Not sure= if this is helpful but we've added end/end latency measurements for UD= P traffic in iperf 2.0.10. =C2=A0 It does require the clocks to be synched.= =C2=A0 I use a spectracom tsync pcie card with either an oven controlled os= cillator or a GPS disciplined one, then use precision time protocol to dist= ribute the clock over ip multicast.=C2=A0 For Linux, the traffic threads ar= e set to realtime scheduling to minimize latency adds per thread scheduling= ..

I'm also in the process of implementing a very si= mple isochronous option where the iperf client (tx) accepts a frames per se= cond commmand line value (e.g. 60) as well as a log norm= al distribution for the input to somewhat simulate variable bit rates.= =C2=A0 On the iperf receiver considering implementing an underflow/overflow= counter per the expected frames per second.

Latency does seem to be= a significant metric.=C2=A0 Also is power consumption.

C= omments welcome.

Bob

On Mon, Oct 9, 2017 at 1:41 PM, <dp= reed@reed.com> wrote:
It's worth setting a stretch latency goal th= at is in principle achievable.

=C2=A0

I get the sense that the wireless group= obsesses over maximum channel utilization rather than excellent latency.= =C2=A0 This is where it's important to put latency as a primary goal, a= nd utilization as the secondary goal, rather than vice versa.

=C2=A0

It's= easy to get at this by observing that the minimum latency on the shared ch= annel is achieved by round-robin scheduling of packets that are of sufficie= nt size that per packet overhead doesn't dominate.

=C2=A0

So only aggrega= te when there are few contenders for the channel, or the packets are quite = small compared to the per-packet overhead. When there are more contenders, = still aggregate small packets, but only those that are actually waiting. Bu= t large packets shouldn't be aggregated.

=C2=A0

Multicast should be avoid= ed by higher level protocols for the most part, and the latency of multicas= t should be a non-issue. In wireless, it's kind of a dumb idea anyway, = given that stations have widely varying propagation characteristics. Do jus= t enough to support DHCP and so forth.

=C2=A0

It's so much fun for tha ha= rdware designers to throw in stuff that only helps in marketing benchmarks = (like getting a few percent on throughput in lab conditions that never happ= en in the field) that it is tempting for OS driver writers to use those fea= tures (like deep queues and offload processing bells and whistles). But the= real issue to be solved is that turn-taking "bloat" that comes f= rom too much attempt to aggregate, to handle the "sole transmitter to = dedicated receiver case" etc.

=C2=A0

I use 10 GigE in my house. I don= 9;t use it because I want to do 10 Gig File Transfers all day and measure t= hem. I use it because (properly managed) it gives me *low latency*. That lo= w latency is what matters, not throughput. My average load, if spread out a= cross 24 hours, could be handled by 802.11b for the entire house.

=C2=A0

We a= re soon going to have 802.11ax in the home. That's approximately 10 Gb/= sec, but wireless. No TV streaming can fill it. It's not for continuous= isochronous traffic at all.

=C2=A0

What it is for is *low latency*. So if th= e adapters and the drivers won't give me that low latency, what good is= 10 Gb/sec at all. This is true for 802.11ac, as well.

=C2=A0

We aren't b= uilding Dragsters fueled with nitro, to run down 1/4 mile of track but unab= le to steer.

=C2=A0

Instead, we want to be able to connect musical instrument= s in an electronic symphony, where timing is everything.

=C2=A0



On Monday, Octo= ber 9, 2017 4:13pm, "Dave Taht" <dave.taht@gmail.com> said:

> There were five ideas I'd wanted to pursue at s= ome point. I''m not
> presently on linux-wireless, nor do I h= ave time to pay attention right
> now - but I'm enjoying that thr= ead passively.
>
> To get those ideas "out there" ag= ain:
>
> * adding a fixed length fq'd queue for multicast.=
>
> * Reducing retransmits at low rates
>
> See = the recent paper:
>
> "Resolving Bufferbloat in TCP Commu= nication over IEEE 802.11 n WLAN by
> Reducing MAC Retransmission Lim= it at Low Data Rate" (I'd paste a link
> but for some reason= that doesn't work well)
>
> Even with their simple bi-mod= al model it worked pretty well.
>
> It also reduces contention= with "bad" stations more automagically.
>
> * Less = buffering at the driver.
>
> Presently (ath9k) there are two-t= hree aggregates stacked up at the driver.
>
> With a good esti= mate for how long it will take to service one, forming
> another with= in that deadline seems feasible, so you only need to have
> one in th= e hardware itself.
>
> Simple example: you have data in the ha= rdware projected to take a
> minimum of 4ms to transmit. Don't fo= rm a new aggregate and submit it
> to the hardware for 3.5ms.
>=
> I know full well that a "good" estimate is hard, and th= ings like
> mu-mimo complicate things. Still, I'd like to get bel= ow 20ms of
> latency within the driver, and this is one way to get th= ere.
>
> * Reducing the size of a txop under contention
>= ;
> if you have 5 stations getting blasted away at 5ms each, and one= that
> only wants 1ms worth of traffic, "soon", temporaril= y reducing the size
> of the txop for everybody so you can service mo= re stations faster,
> seems useful.
>
> * Merging acs wh= en sane to do so
>
> sane aggregation in general works better = than prioritizing does, as
> shown in ending the anomaly.
> > --
>
> Dave T=C3=A4ht
> CEO, TekLibre, LLC
>= http://www.teklibre= .com
> Tel: 1-669-226-2619
> ___________________________= ____________________
> Make-wifi-fast mailing list
> Make-w= ifi-fast@lists.bufferbloat.net
> https://lists.buff= erbloat.net/listinfo/make-wifi-fast

_________________________________________= ______
Make-wifi-fast mailing list
M= ake-wifi-fast@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/mak= e-wifi-fast

_______________________________________________
Make-wifi-fast mail= ing list
Make-wifi-fast@lists.bufferbloat.net
https= ://lists.bufferbloat.net/listinfo/make-wifi-fast



--94eb2c198f2a356f7a055b4aeb40--