[Make-wifi-fast] less latency, more filling... for wifi

Lets make wifi fast again!
 help / color / mirror / Atom feed

* [Make-wifi-fast] less latency, more filling... for wifi
@ 2017-10-09 20:13 Dave Taht
  2017-10-09 20:41 ` dpreed
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Taht @ 2017-10-09 20:13 UTC (permalink / raw)
  To: make-wifi-fast, Johannes Berg

There were five ideas I'd wanted to pursue at some point. I''m not
presently on linux-wireless, nor do I have time to pay attention right
now - but I'm enjoying that thread passively.

To get those ideas "out there" again:

* adding a fixed length fq'd queue for multicast.

* Reducing retransmits at low rates

See the recent paper:

"Resolving Bufferbloat in TCP Communication over IEEE 802.11 n WLAN by
Reducing MAC Retransmission Limit at Low Data Rate" (I'd paste a link
but for some reason that doesn't work well)

Even with their simple bi-modal model it worked pretty well.

It also reduces contention with "bad" stations more automagically.

* Less buffering at the driver.

Presently (ath9k) there are two-three aggregates stacked up at the driver.

With a good estimate for how long it will take to service one, forming
another within that deadline seems feasible, so you only need to have
one in the hardware itself.

Simple example: you have data in the hardware projected to take a
minimum of 4ms to transmit. Don't form a new aggregate and submit it
to the hardware for 3.5ms.

I know full well that a "good" estimate is hard, and things like
mu-mimo complicate things. Still, I'd like to get below 20ms of
latency within the driver, and this is one way to get there.

* Reducing the size of a txop under contention

if you have 5 stations getting blasted away at 5ms each, and one that
only wants 1ms worth of traffic, "soon", temporarily reducing the size
of the txop for everybody so you can service more stations faster,
seems useful.

* Merging acs when sane to do so

sane aggregation in general works better than prioritizing  does, as
shown in ending the anomaly.

-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Make-wifi-fast] less latency, more filling... for wifi
  2017-10-09 20:13 [Make-wifi-fast] less latency, more filling... for wifi Dave Taht
@ 2017-10-09 20:41 ` dpreed
  2017-10-09 21:04   ` Bob McMahon
  0 siblings, 1 reply; 17+ messages in thread
From: dpreed @ 2017-10-09 20:41 UTC (permalink / raw)
  To: Dave Taht; +Cc: make-wifi-fast, Johannes Berg

[-- Attachment #1: Type: text/plain, Size: 4691 bytes --]

It's worth setting a stretch latency goal that is in principle achievable.

I get the sense that the wireless group obsesses over maximum channel utilization rather than excellent latency.  This is where it's important to put latency as a primary goal, and utilization as the secondary goal, rather than vice versa.

It's easy to get at this by observing that the minimum latency on the shared channel is achieved by round-robin scheduling of packets that are of sufficient size that per packet overhead doesn't dominate.

So only aggregate when there are few contenders for the channel, or the packets are quite small compared to the per-packet overhead. When there are more contenders, still aggregate small packets, but only those that are actually waiting. But large packets shouldn't be aggregated.

Multicast should be avoided by higher level protocols for the most part, and the latency of multicast should be a non-issue. In wireless, it's kind of a dumb idea anyway, given that stations have widely varying propagation characteristics. Do just enough to support DHCP and so forth.

It's so much fun for tha hardware designers to throw in stuff that only helps in marketing benchmarks (like getting a few percent on throughput in lab conditions that never happen in the field) that it is tempting for OS driver writers to use those features (like deep queues and offload processing bells and whistles). But the real issue to be solved is that turn-taking "bloat" that comes from too much attempt to aggregate, to handle the "sole transmitter to dedicated receiver case" etc.

I use 10 GigE in my house. I don't use it because I want to do 10 Gig File Transfers all day and measure them. I use it because (properly managed) it gives me *low latency*. That low latency is what matters, not throughput. My average load, if spread out across 24 hours, could be handled by 802.11b for the entire house.

We are soon going to have 802.11ax in the home. That's approximately 10 Gb/sec, but wireless. No TV streaming can fill it. It's not for continuous isochronous traffic at all.

What it is for is *low latency*. So if the adapters and the drivers won't give me that low latency, what good is 10 Gb/sec at all. This is true for 802.11ac, as well.

We aren't building Dragsters fueled with nitro, to run down 1/4 mile of track but unable to steer.

Instead, we want to be able to connect musical instruments in an electronic symphony, where timing is everything.

On Monday, October 9, 2017 4:13pm, "Dave Taht" <dave.taht@gmail.com> said:

> There were five ideas I'd wanted to pursue at some point. I''m not
> presently on linux-wireless, nor do I have time to pay attention right
> now - but I'm enjoying that thread passively.
> 
> To get those ideas "out there" again:
> 
> * adding a fixed length fq'd queue for multicast.
> 
> * Reducing retransmits at low rates
> 
> See the recent paper:
> 
> "Resolving Bufferbloat in TCP Communication over IEEE 802.11 n WLAN by
> Reducing MAC Retransmission Limit at Low Data Rate" (I'd paste a link
> but for some reason that doesn't work well)
> 
> Even with their simple bi-modal model it worked pretty well.
> 
> It also reduces contention with "bad" stations more automagically.
> 
> * Less buffering at the driver.
> 
> Presently (ath9k) there are two-three aggregates stacked up at the driver.
> 
> With a good estimate for how long it will take to service one, forming
> another within that deadline seems feasible, so you only need to have
> one in the hardware itself.
> 
> Simple example: you have data in the hardware projected to take a
> minimum of 4ms to transmit. Don't form a new aggregate and submit it
> to the hardware for 3.5ms.
> 
> I know full well that a "good" estimate is hard, and things like
> mu-mimo complicate things. Still, I'd like to get below 20ms of
> latency within the driver, and this is one way to get there.
> 
> * Reducing the size of a txop under contention
> 
> if you have 5 stations getting blasted away at 5ms each, and one that
> only wants 1ms worth of traffic, "soon", temporarily reducing the size
> of the txop for everybody so you can service more stations faster,
> seems useful.
> 
> * Merging acs when sane to do so
> 
> sane aggregation in general works better than prioritizing does, as
> shown in ending the anomaly.
> 
> --
> 
> Dave Täht
> CEO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-669-226-2619
> _______________________________________________
> Make-wifi-fast mailing list
> Make-wifi-fast@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast

[-- Attachment #2: Type: text/html, Size: 7715 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Make-wifi-fast] less latency, more filling... for wifi
  2017-10-09 20:41 ` dpreed
@ 2017-10-09 21:04   ` Bob McMahon
  2017-10-09 21:44     ` Simon Barber
  2017-10-11 21:30     ` Jesper Dangaard Brouer
  0 siblings, 2 replies; 17+ messages in thread
From: Bob McMahon @ 2017-10-09 21:04 UTC (permalink / raw)
  To: David Reed; +Cc: Dave Taht, make-wifi-fast, Johannes Berg

[-- Attachment #1: Type: text/plain, Size: 6272 bytes --]

Hi,

Not sure if this is helpful but we've added end/end latency measurements
for UDP traffic in iperf 2.0.10 <https://sourceforge.net/projects/iperf2/>.
  It does require the clocks to be synched.  I use a spectracom tsync pcie
card with either an oven controlled oscillator or a GPS disciplined one,
then use precision time protocol to distribute the clock over ip
multicast.  For Linux, the traffic threads are set to realtime scheduling
to minimize latency adds per thread scheduling..

I'm also in the process of implementing a very simple isochronous option
where the iperf client (tx) accepts a frames per second commmand line value
(e.g. 60) as well as a log normal distribution
<https://sourceforge.net/p/iperf2/code/ci/master/tree/src/pdfs.c> for the
input to somewhat simulate variable bit rates.  On the iperf receiver
considering implementing an underflow/overflow counter per the expected
frames per second.

Latency does seem to be a significant metric.  Also is power consumption.

Comments welcome.

Bob

On Mon, Oct 9, 2017 at 1:41 PM, <dpreed@reed.com> wrote:

> It's worth setting a stretch latency goal that is in principle achievable.
>
>
>
> I get the sense that the wireless group obsesses over maximum channel
> utilization rather than excellent latency.  This is where it's important to
> put latency as a primary goal, and utilization as the secondary goal,
> rather than vice versa.
>
>
>
> It's easy to get at this by observing that the minimum latency on the
> shared channel is achieved by round-robin scheduling of packets that are of
> sufficient size that per packet overhead doesn't dominate.
>
>
>
> So only aggregate when there are few contenders for the channel, or the
> packets are quite small compared to the per-packet overhead. When there are
> more contenders, still aggregate small packets, but only those that are
> actually waiting. But large packets shouldn't be aggregated.
>
>
>
> Multicast should be avoided by higher level protocols for the most part,
> and the latency of multicast should be a non-issue. In wireless, it's kind
> of a dumb idea anyway, given that stations have widely varying propagation
> characteristics. Do just enough to support DHCP and so forth.
>
>
>
> It's so much fun for tha hardware designers to throw in stuff that only
> helps in marketing benchmarks (like getting a few percent on throughput in
> lab conditions that never happen in the field) that it is tempting for OS
> driver writers to use those features (like deep queues and offload
> processing bells and whistles). But the real issue to be solved is that
> turn-taking "bloat" that comes from too much attempt to aggregate, to
> handle the "sole transmitter to dedicated receiver case" etc.
>
>
>
> I use 10 GigE in my house. I don't use it because I want to do 10 Gig File
> Transfers all day and measure them. I use it because (properly managed) it
> gives me *low latency*. That low latency is what matters, not throughput.
> My average load, if spread out across 24 hours, could be handled by 802.11b
> for the entire house.
>
>
>
> We are soon going to have 802.11ax in the home. That's approximately 10
> Gb/sec, but wireless. No TV streaming can fill it. It's not for continuous
> isochronous traffic at all.
>
>
>
> What it is for is *low latency*. So if the adapters and the drivers won't
> give me that low latency, what good is 10 Gb/sec at all. This is true for
> 802.11ac, as well.
>
>
>
> We aren't building Dragsters fueled with nitro, to run down 1/4 mile of
> track but unable to steer.
>
>
>
> Instead, we want to be able to connect musical instruments in an
> electronic symphony, where timing is everything.
>
>
>
>
>
> On Monday, October 9, 2017 4:13pm, "Dave Taht" <dave.taht@gmail.com> said:
>
> > There were five ideas I'd wanted to pursue at some point. I''m not
> > presently on linux-wireless, nor do I have time to pay attention right
> > now - but I'm enjoying that thread passively.
> >
> > To get those ideas "out there" again:
> >
> > * adding a fixed length fq'd queue for multicast.
> >
> > * Reducing retransmits at low rates
> >
> > See the recent paper:
> >
> > "Resolving Bufferbloat in TCP Communication over IEEE 802.11 n WLAN by
> > Reducing MAC Retransmission Limit at Low Data Rate" (I'd paste a link
> > but for some reason that doesn't work well)
> >
> > Even with their simple bi-modal model it worked pretty well.
> >
> > It also reduces contention with "bad" stations more automagically.
> >
> > * Less buffering at the driver.
> >
> > Presently (ath9k) there are two-three aggregates stacked up at the
> driver.
> >
> > With a good estimate for how long it will take to service one, forming
> > another within that deadline seems feasible, so you only need to have
> > one in the hardware itself.
> >
> > Simple example: you have data in the hardware projected to take a
> > minimum of 4ms to transmit. Don't form a new aggregate and submit it
> > to the hardware for 3.5ms.
> >
> > I know full well that a "good" estimate is hard, and things like
> > mu-mimo complicate things. Still, I'd like to get below 20ms of
> > latency within the driver, and this is one way to get there.
> >
> > * Reducing the size of a txop under contention
> >
> > if you have 5 stations getting blasted away at 5ms each, and one that
> > only wants 1ms worth of traffic, "soon", temporarily reducing the size
> > of the txop for everybody so you can service more stations faster,
> > seems useful.
> >
> > * Merging acs when sane to do so
> >
> > sane aggregation in general works better than prioritizing does, as
> > shown in ending the anomaly.
> >
> > --
> >
> > Dave Täht
> > CEO, TekLibre, LLC
> > http://www.teklibre.com
> > Tel: 1-669-226-2619 <(669)%20226-2619>
> > _______________________________________________
> > Make-wifi-fast mailing list
> > Make-wifi-fast@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/make-wifi-fast
>
> _______________________________________________
> Make-wifi-fast mailing list
> Make-wifi-fast@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast
>

[-- Attachment #2: Type: text/html, Size: 9159 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Make-wifi-fast] less latency, more filling... for wifi
  2017-10-09 21:04   ` Bob McMahon
@ 2017-10-09 21:44     ` Simon Barber
  2017-10-09 22:02       ` Bob McMahon
  2017-10-16 21:26       ` Simon Barber
  2017-10-11 21:30     ` Jesper Dangaard Brouer
  1 sibling, 2 replies; 17+ messages in thread
From: Simon Barber @ 2017-10-09 21:44 UTC (permalink / raw)
  To: Bob McMahon; +Cc: David Reed, make-wifi-fast, Johannes Berg

[-- Attachment #1: Type: text/plain, Size: 6979 bytes --]

Very nice - I’m using iperf3.2 and always have to figure packets per second by combining packet size and bandwidth. This will be much easier. Also direct reporting of one way latency variance above minimum round trip would be very useful.

Simon

> On Oct 9, 2017, at 2:04 PM, Bob McMahon <bob.mcmahon@broadcom.com> wrote:
> 
> Hi,
> 
> Not sure if this is helpful but we've added end/end latency measurements for UDP traffic in iperf 2.0.10 <https://sourceforge.net/projects/iperf2/>.   It does require the clocks to be synched.  I use a spectracom tsync pcie card with either an oven controlled oscillator or a GPS disciplined one, then use precision time protocol to distribute the clock over ip multicast.  For Linux, the traffic threads are set to realtime scheduling to minimize latency adds per thread scheduling..
> 
> I'm also in the process of implementing a very simple isochronous option where the iperf client (tx) accepts a frames per second commmand line value (e.g. 60) as well as a log normal distribution <https://sourceforge.net/p/iperf2/code/ci/master/tree/src/pdfs.c> for the input to somewhat simulate variable bit rates.  On the iperf receiver considering implementing an underflow/overflow counter per the expected frames per second.
> 
> Latency does seem to be a significant metric.  Also is power consumption.
> 
> Comments welcome.
> 
> Bob
> 
> On Mon, Oct 9, 2017 at 1:41 PM, <dpreed@reed.com <mailto:dpreed@reed.com>> wrote:
> It's worth setting a stretch latency goal that is in principle achievable.
>  
> I get the sense that the wireless group obsesses over maximum channel utilization rather than excellent latency.  This is where it's important to put latency as a primary goal, and utilization as the secondary goal, rather than vice versa.
>  
> It's easy to get at this by observing that the minimum latency on the shared channel is achieved by round-robin scheduling of packets that are of sufficient size that per packet overhead doesn't dominate.
>  
> So only aggregate when there are few contenders for the channel, or the packets are quite small compared to the per-packet overhead. When there are more contenders, still aggregate small packets, but only those that are actually waiting. But large packets shouldn't be aggregated.
>  
> Multicast should be avoided by higher level protocols for the most part, and the latency of multicast should be a non-issue. In wireless, it's kind of a dumb idea anyway, given that stations have widely varying propagation characteristics. Do just enough to support DHCP and so forth.
>  
> It's so much fun for tha hardware designers to throw in stuff that only helps in marketing benchmarks (like getting a few percent on throughput in lab conditions that never happen in the field) that it is tempting for OS driver writers to use those features (like deep queues and offload processing bells and whistles). But the real issue to be solved is that turn-taking "bloat" that comes from too much attempt to aggregate, to handle the "sole transmitter to dedicated receiver case" etc.
>  
> I use 10 GigE in my house. I don't use it because I want to do 10 Gig File Transfers all day and measure them. I use it because (properly managed) it gives me *low latency*. That low latency is what matters, not throughput. My average load, if spread out across 24 hours, could be handled by 802.11b for the entire house.
>  
> We are soon going to have 802.11ax in the home. That's approximately 10 Gb/sec, but wireless. No TV streaming can fill it. It's not for continuous isochronous traffic at all.
>  
> What it is for is *low latency*. So if the adapters and the drivers won't give me that low latency, what good is 10 Gb/sec at all. This is true for 802.11ac, as well.
>  
> We aren't building Dragsters fueled with nitro, to run down 1/4 mile of track but unable to steer.
>  
> Instead, we want to be able to connect musical instruments in an electronic symphony, where timing is everything.
>  
> 
> 
> On Monday, October 9, 2017 4:13pm, "Dave Taht" <dave.taht@gmail.com <mailto:dave.taht@gmail.com>> said:
> 
> > There were five ideas I'd wanted to pursue at some point. I''m not
> > presently on linux-wireless, nor do I have time to pay attention right
> > now - but I'm enjoying that thread passively.
> > 
> > To get those ideas "out there" again:
> > 
> > * adding a fixed length fq'd queue for multicast.
> > 
> > * Reducing retransmits at low rates
> > 
> > See the recent paper:
> > 
> > "Resolving Bufferbloat in TCP Communication over IEEE 802.11 n WLAN by
> > Reducing MAC Retransmission Limit at Low Data Rate" (I'd paste a link
> > but for some reason that doesn't work well)
> > 
> > Even with their simple bi-modal model it worked pretty well.
> > 
> > It also reduces contention with "bad" stations more automagically.
> > 
> > * Less buffering at the driver.
> > 
> > Presently (ath9k) there are two-three aggregates stacked up at the driver.
> > 
> > With a good estimate for how long it will take to service one, forming
> > another within that deadline seems feasible, so you only need to have
> > one in the hardware itself.
> > 
> > Simple example: you have data in the hardware projected to take a
> > minimum of 4ms to transmit. Don't form a new aggregate and submit it
> > to the hardware for 3.5ms.
> > 
> > I know full well that a "good" estimate is hard, and things like
> > mu-mimo complicate things. Still, I'd like to get below 20ms of
> > latency within the driver, and this is one way to get there.
> > 
> > * Reducing the size of a txop under contention
> > 
> > if you have 5 stations getting blasted away at 5ms each, and one that
> > only wants 1ms worth of traffic, "soon", temporarily reducing the size
> > of the txop for everybody so you can service more stations faster,
> > seems useful.
> > 
> > * Merging acs when sane to do so
> > 
> > sane aggregation in general works better than prioritizing does, as
> > shown in ending the anomaly.
> > 
> > --
> > 
> > Dave Täht
> > CEO, TekLibre, LLC
> > http://www.teklibre.com <http://www.teklibre.com/>
> > Tel: 1-669-226-2619 <tel:(669)%20226-2619>
> > _______________________________________________
> > Make-wifi-fast mailing list
> > Make-wifi-fast@lists.bufferbloat.net <mailto:Make-wifi-fast@lists.bufferbloat.net>
> > https://lists.bufferbloat.net/listinfo/make-wifi-fast <https://lists.bufferbloat.net/listinfo/make-wifi-fast>
> _______________________________________________
> Make-wifi-fast mailing list
> Make-wifi-fast@lists.bufferbloat.net <mailto:Make-wifi-fast@lists.bufferbloat.net>
> https://lists.bufferbloat.net/listinfo/make-wifi-fast <https://lists.bufferbloat.net/listinfo/make-wifi-fast>
> 
> _______________________________________________
> Make-wifi-fast mailing list
> Make-wifi-fast@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast


[-- Attachment #2: Type: text/html, Size: 11451 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Make-wifi-fast] less latency, more filling... for wifi
  2017-10-09 21:44     ` Simon Barber
@ 2017-10-09 22:02       ` Bob McMahon
  2017-10-11 20:03         ` Bob McMahon
  2017-10-16 21:26       ` Simon Barber
  1 sibling, 1 reply; 17+ messages in thread
From: Bob McMahon @ 2017-10-09 22:02 UTC (permalink / raw)
  To: Simon Barber; +Cc: David Reed, make-wifi-fast, Johannes Berg

[-- Attachment #1: Type: text/plain, Size: 7596 bytes --]

Not sure how to determine when one way latency is above round trip.   Iperf
traffic for latency uses UDP where nothing is coming back.  For TCP, the
iperf client will report a sampled RTT per the network stack (on operating
systems that support this.)

One idea - have two traffic streams, one TCP and one UDP, and use higher
level script (e.g. via python
<https://sourceforge.net/p/iperf2/code/ci/master/tree/flows/flows.py>) to
poll data from each and perform the compare?    Though, not sure if this
would give you what you're looking for.

Bob

On Mon, Oct 9, 2017 at 2:44 PM, Simon Barber <simon@superduper.net> wrote:

> Very nice - I’m using iperf3.2 and always have to figure packets per
> second by combining packet size and bandwidth. This will be much easier.
> Also direct reporting of one way latency variance above minimum round trip
> would be very useful.
>
> Simon
>
> On Oct 9, 2017, at 2:04 PM, Bob McMahon <bob.mcmahon@broadcom.com> wrote:
>
> Hi,
>
> Not sure if this is helpful but we've added end/end latency measurements
> for UDP traffic in iperf 2.0.10 <https://sourceforge.net/projects/iperf2/>.
>   It does require the clocks to be synched.  I use a spectracom tsync pcie
> card with either an oven controlled oscillator or a GPS disciplined one,
> then use precision time protocol to distribute the clock over ip
> multicast.  For Linux, the traffic threads are set to realtime scheduling
> to minimize latency adds per thread scheduling..
>
> I'm also in the process of implementing a very simple isochronous option
> where the iperf client (tx) accepts a frames per second commmand line value
> (e.g. 60) as well as a log normal distribution
> <https://sourceforge.net/p/iperf2/code/ci/master/tree/src/pdfs.c> for the
> input to somewhat simulate variable bit rates.  On the iperf receiver
> considering implementing an underflow/overflow counter per the expected
> frames per second.
>
> Latency does seem to be a significant metric.  Also is power consumption.
>
> Comments welcome.
>
> Bob
>
> On Mon, Oct 9, 2017 at 1:41 PM, <dpreed@reed.com> wrote:
>
>> It's worth setting a stretch latency goal that is in principle achievable.
>>
>>
>> I get the sense that the wireless group obsesses over maximum channel
>> utilization rather than excellent latency.  This is where it's important to
>> put latency as a primary goal, and utilization as the secondary goal,
>> rather than vice versa.
>>
>>
>> It's easy to get at this by observing that the minimum latency on the
>> shared channel is achieved by round-robin scheduling of packets that are of
>> sufficient size that per packet overhead doesn't dominate.
>>
>>
>> So only aggregate when there are few contenders for the channel, or the
>> packets are quite small compared to the per-packet overhead. When there are
>> more contenders, still aggregate small packets, but only those that are
>> actually waiting. But large packets shouldn't be aggregated.
>>
>>
>> Multicast should be avoided by higher level protocols for the most part,
>> and the latency of multicast should be a non-issue. In wireless, it's kind
>> of a dumb idea anyway, given that stations have widely varying propagation
>> characteristics. Do just enough to support DHCP and so forth.
>>
>>
>> It's so much fun for tha hardware designers to throw in stuff that only
>> helps in marketing benchmarks (like getting a few percent on throughput in
>> lab conditions that never happen in the field) that it is tempting for OS
>> driver writers to use those features (like deep queues and offload
>> processing bells and whistles). But the real issue to be solved is that
>> turn-taking "bloat" that comes from too much attempt to aggregate, to
>> handle the "sole transmitter to dedicated receiver case" etc.
>>
>>
>> I use 10 GigE in my house. I don't use it because I want to do 10 Gig
>> File Transfers all day and measure them. I use it because (properly
>> managed) it gives me *low latency*. That low latency is what matters, not
>> throughput. My average load, if spread out across 24 hours, could be
>> handled by 802.11b for the entire house.
>>
>>
>> We are soon going to have 802.11ax in the home. That's approximately 10
>> Gb/sec, but wireless. No TV streaming can fill it. It's not for continuous
>> isochronous traffic at all.
>>
>>
>> What it is for is *low latency*. So if the adapters and the drivers won't
>> give me that low latency, what good is 10 Gb/sec at all. This is true for
>> 802.11ac, as well.
>>
>>
>> We aren't building Dragsters fueled with nitro, to run down 1/4 mile of
>> track but unable to steer.
>>
>>
>> Instead, we want to be able to connect musical instruments in an
>> electronic symphony, where timing is everything.
>>
>>
>>
>>
>> On Monday, October 9, 2017 4:13pm, "Dave Taht" <dave.taht@gmail.com>
>> said:
>>
>> > There were five ideas I'd wanted to pursue at some point. I''m not
>> > presently on linux-wireless, nor do I have time to pay attention right
>> > now - but I'm enjoying that thread passively.
>> >
>> > To get those ideas "out there" again:
>> >
>> > * adding a fixed length fq'd queue for multicast.
>> >
>> > * Reducing retransmits at low rates
>> >
>> > See the recent paper:
>> >
>> > "Resolving Bufferbloat in TCP Communication over IEEE 802.11 n WLAN by
>> > Reducing MAC Retransmission Limit at Low Data Rate" (I'd paste a link
>> > but for some reason that doesn't work well)
>> >
>> > Even with their simple bi-modal model it worked pretty well.
>> >
>> > It also reduces contention with "bad" stations more automagically.
>> >
>> > * Less buffering at the driver.
>> >
>> > Presently (ath9k) there are two-three aggregates stacked up at the
>> driver.
>> >
>> > With a good estimate for how long it will take to service one, forming
>> > another within that deadline seems feasible, so you only need to have
>> > one in the hardware itself.
>> >
>> > Simple example: you have data in the hardware projected to take a
>> > minimum of 4ms to transmit. Don't form a new aggregate and submit it
>> > to the hardware for 3.5ms.
>> >
>> > I know full well that a "good" estimate is hard, and things like
>> > mu-mimo complicate things. Still, I'd like to get below 20ms of
>> > latency within the driver, and this is one way to get there.
>> >
>> > * Reducing the size of a txop under contention
>> >
>> > if you have 5 stations getting blasted away at 5ms each, and one that
>> > only wants 1ms worth of traffic, "soon", temporarily reducing the size
>> > of the txop for everybody so you can service more stations faster,
>> > seems useful.
>> >
>> > * Merging acs when sane to do so
>> >
>> > sane aggregation in general works better than prioritizing does, as
>> > shown in ending the anomaly.
>> >
>> > --
>> >
>> > Dave Täht
>> > CEO, TekLibre, LLC
>> > http://www.teklibre.com
>> > Tel: 1-669-226-2619 <(669)%20226-2619>
>> > _______________________________________________
>> > Make-wifi-fast mailing list
>> > Make-wifi-fast@lists.bufferbloat.net
>> > https://lists.bufferbloat.net/listinfo/make-wifi-fast
>>
>> _______________________________________________
>> Make-wifi-fast mailing list
>> Make-wifi-fast@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/make-wifi-fast
>>
>
> _______________________________________________
> Make-wifi-fast mailing list
> Make-wifi-fast@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast
>
>
>

[-- Attachment #2: Type: text/html, Size: 11316 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Make-wifi-fast] less latency, more filling... for wifi
  2017-10-09 22:02       ` Bob McMahon
@ 2017-10-11 20:03         ` Bob McMahon
  0 siblings, 0 replies; 17+ messages in thread
From: Bob McMahon @ 2017-10-11 20:03 UTC (permalink / raw)
  To: Simon Barber; +Cc: David Reed, make-wifi-fast, Johannes Berg

[-- Attachment #1: Type: text/plain, Size: 8955 bytes --]

FYI, we're considering adding support for "--udp-triggers" in iperf
2.0.10+.   Setting this option will cause a "magic number" to be placed in
the UDP payload such that logic moving bytes through the system can be
triggered to append their own timestamps into the udp payload, i.e. as the
payload moves through each subsystem.  This can help one can analyze the
latency path of a single packet, as an example.   Note: the standard iperf
microsecond timestamps come from the application level (on tx) and from
SO_TIMESTAMPs on receive (assuming SO_TIMESTAMPs is supported, otherwise
its a syscall() after the socket receive)   Being able to instrument each
logic paths contribution to a single packet's latency can be helpful, at
least for driver/firmware/ucode engineers.

On the server side, we'll probably add a --histogram option so the latency
distributions can be displayed (per each -i interval time) and higher level
scripts can produce PDFs, CDFs and CCDFs for latencies.

Let me know if generalizing this support in iperf is useful.

Bob

On Mon, Oct 9, 2017 at 3:02 PM, Bob McMahon <bob.mcmahon@broadcom.com>
wrote:

> Not sure how to determine when one way latency is above round trip.
> Iperf traffic for latency uses UDP where nothing is coming back.  For TCP,
> the iperf client will report a sampled RTT per the network stack (on
> operating systems that support this.)
>
> One idea - have two traffic streams, one TCP and one UDP, and use higher
> level script (e.g. via python
> <https://sourceforge.net/p/iperf2/code/ci/master/tree/flows/flows.py>) to
> poll data from each and perform the compare?    Though, not sure if this
> would give you what you're looking for.
>
> Bob
>
> On Mon, Oct 9, 2017 at 2:44 PM, Simon Barber <simon@superduper.net> wrote:
>
>> Very nice - I’m using iperf3.2 and always have to figure packets per
>> second by combining packet size and bandwidth. This will be much easier.
>> Also direct reporting of one way latency variance above minimum round trip
>> would be very useful.
>>
>> Simon
>>
>> On Oct 9, 2017, at 2:04 PM, Bob McMahon <bob.mcmahon@broadcom.com> wrote:
>>
>> Hi,
>>
>> Not sure if this is helpful but we've added end/end latency measurements
>> for UDP traffic in iperf 2.0.10
>> <https://sourceforge.net/projects/iperf2/>.   It does require the clocks
>> to be synched.  I use a spectracom tsync pcie card with either an oven
>> controlled oscillator or a GPS disciplined one, then use precision time
>> protocol to distribute the clock over ip multicast.  For Linux, the traffic
>> threads are set to realtime scheduling to minimize latency adds per thread
>> scheduling..
>>
>> I'm also in the process of implementing a very simple isochronous option
>> where the iperf client (tx) accepts a frames per second commmand line value
>> (e.g. 60) as well as a log normal distribution
>> <https://sourceforge.net/p/iperf2/code/ci/master/tree/src/pdfs.c> for
>> the input to somewhat simulate variable bit rates.  On the iperf receiver
>> considering implementing an underflow/overflow counter per the expected
>> frames per second.
>>
>> Latency does seem to be a significant metric.  Also is power consumption.
>>
>> Comments welcome.
>>
>> Bob
>>
>> On Mon, Oct 9, 2017 at 1:41 PM, <dpreed@reed.com> wrote:
>>
>>> It's worth setting a stretch latency goal that is in principle
>>> achievable.
>>>
>>>
>>> I get the sense that the wireless group obsesses over maximum channel
>>> utilization rather than excellent latency.  This is where it's important to
>>> put latency as a primary goal, and utilization as the secondary goal,
>>> rather than vice versa.
>>>
>>>
>>> It's easy to get at this by observing that the minimum latency on the
>>> shared channel is achieved by round-robin scheduling of packets that are of
>>> sufficient size that per packet overhead doesn't dominate.
>>>
>>>
>>> So only aggregate when there are few contenders for the channel, or the
>>> packets are quite small compared to the per-packet overhead. When there are
>>> more contenders, still aggregate small packets, but only those that are
>>> actually waiting. But large packets shouldn't be aggregated.
>>>
>>>
>>> Multicast should be avoided by higher level protocols for the most part,
>>> and the latency of multicast should be a non-issue. In wireless, it's kind
>>> of a dumb idea anyway, given that stations have widely varying propagation
>>> characteristics. Do just enough to support DHCP and so forth.
>>>
>>>
>>> It's so much fun for tha hardware designers to throw in stuff that only
>>> helps in marketing benchmarks (like getting a few percent on throughput in
>>> lab conditions that never happen in the field) that it is tempting for OS
>>> driver writers to use those features (like deep queues and offload
>>> processing bells and whistles). But the real issue to be solved is that
>>> turn-taking "bloat" that comes from too much attempt to aggregate, to
>>> handle the "sole transmitter to dedicated receiver case" etc.
>>>
>>>
>>> I use 10 GigE in my house. I don't use it because I want to do 10 Gig
>>> File Transfers all day and measure them. I use it because (properly
>>> managed) it gives me *low latency*. That low latency is what matters, not
>>> throughput. My average load, if spread out across 24 hours, could be
>>> handled by 802.11b for the entire house.
>>>
>>>
>>> We are soon going to have 802.11ax in the home. That's approximately 10
>>> Gb/sec, but wireless. No TV streaming can fill it. It's not for continuous
>>> isochronous traffic at all.
>>>
>>>
>>> What it is for is *low latency*. So if the adapters and the drivers
>>> won't give me that low latency, what good is 10 Gb/sec at all. This is true
>>> for 802.11ac, as well.
>>>
>>>
>>> We aren't building Dragsters fueled with nitro, to run down 1/4 mile of
>>> track but unable to steer.
>>>
>>>
>>> Instead, we want to be able to connect musical instruments in an
>>> electronic symphony, where timing is everything.
>>>
>>>
>>>
>>>
>>> On Monday, October 9, 2017 4:13pm, "Dave Taht" <dave.taht@gmail.com>
>>> said:
>>>
>>> > There were five ideas I'd wanted to pursue at some point. I''m not
>>> > presently on linux-wireless, nor do I have time to pay attention right
>>> > now - but I'm enjoying that thread passively.
>>> >
>>> > To get those ideas "out there" again:
>>> >
>>> > * adding a fixed length fq'd queue for multicast.
>>> >
>>> > * Reducing retransmits at low rates
>>> >
>>> > See the recent paper:
>>> >
>>> > "Resolving Bufferbloat in TCP Communication over IEEE 802.11 n WLAN by
>>> > Reducing MAC Retransmission Limit at Low Data Rate" (I'd paste a link
>>> > but for some reason that doesn't work well)
>>> >
>>> > Even with their simple bi-modal model it worked pretty well.
>>> >
>>> > It also reduces contention with "bad" stations more automagically.
>>> >
>>> > * Less buffering at the driver.
>>> >
>>> > Presently (ath9k) there are two-three aggregates stacked up at the
>>> driver.
>>> >
>>> > With a good estimate for how long it will take to service one, forming
>>> > another within that deadline seems feasible, so you only need to have
>>> > one in the hardware itself.
>>> >
>>> > Simple example: you have data in the hardware projected to take a
>>> > minimum of 4ms to transmit. Don't form a new aggregate and submit it
>>> > to the hardware for 3.5ms.
>>> >
>>> > I know full well that a "good" estimate is hard, and things like
>>> > mu-mimo complicate things. Still, I'd like to get below 20ms of
>>> > latency within the driver, and this is one way to get there.
>>> >
>>> > * Reducing the size of a txop under contention
>>> >
>>> > if you have 5 stations getting blasted away at 5ms each, and one that
>>> > only wants 1ms worth of traffic, "soon", temporarily reducing the size
>>> > of the txop for everybody so you can service more stations faster,
>>> > seems useful.
>>> >
>>> > * Merging acs when sane to do so
>>> >
>>> > sane aggregation in general works better than prioritizing does, as
>>> > shown in ending the anomaly.
>>> >
>>> > --
>>> >
>>> > Dave Täht
>>> > CEO, TekLibre, LLC
>>> > http://www.teklibre.com
>>> > Tel: 1-669-226-2619 <(669)%20226-2619>
>>> > _______________________________________________
>>> > Make-wifi-fast mailing list
>>> > Make-wifi-fast@lists.bufferbloat.net
>>> > https://lists.bufferbloat.net/listinfo/make-wifi-fast
>>>
>>> _______________________________________________
>>> Make-wifi-fast mailing list
>>> Make-wifi-fast@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/make-wifi-fast
>>>
>>
>> _______________________________________________
>> Make-wifi-fast mailing list
>> Make-wifi-fast@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/make-wifi-fast
>>
>>
>>
>

[-- Attachment #2: Type: text/html, Size: 13046 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Make-wifi-fast] less latency, more filling... for wifi
  2017-10-09 21:04   ` Bob McMahon
  2017-10-09 21:44     ` Simon Barber
@ 2017-10-11 21:30     ` Jesper Dangaard Brouer
  2017-10-12  8:32       ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2017-10-11 21:30 UTC (permalink / raw)
  To: Bob McMahon; +Cc: David Reed, make-wifi-fast, Johannes Berg, brouer

Hi Bob,

Just wanted to let you know, that after you posted this, I've started
to use your iperf2 tool again.  To verify some of my kernel code.

I particularly liked that I can send with a specific PPS rate, like:

 iperf2-git -c 172.16.0.2 -e -u -b 20000pps -i 1


On Mon, 9 Oct 2017 14:04:35 -0700 Bob McMahon <bob.mcmahon@broadcom.com> wrote:

> Not sure if this is helpful but we've added end/end latency measurements
> for UDP traffic in iperf 2.0.10 <https://sourceforge.net/projects/iperf2/>.

I tried it out, but the clock between my two machines are not accurate
enough (as you hint below).

>   It does require the clocks to be synched.  I use a spectracom tsync pcie
> card with either an oven controlled oscillator or a GPS disciplined one,
> then use precision time protocol to distribute the clock over ip
> multicast.  

> For Linux, the traffic threads are set to realtime scheduling
> to minimize latency adds per thread scheduling..

If I use the --realtime option, I did run into issues where the server
thread would busypoll 100% after e.g. a client stopped a test prematurely.
And I had to kill it with -9.

Thanks for you work!
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Make-wifi-fast] less latency, more filling... for wifi
  2017-10-11 21:30     ` Jesper Dangaard Brouer
@ 2017-10-12  8:32       ` Toke Høiland-Jørgensen
  2017-10-12 18:51         ` Bob McMahon
  0 siblings, 1 reply; 17+ messages in thread
From: Toke Høiland-Jørgensen @ 2017-10-12  8:32 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Bob McMahon; +Cc: make-wifi-fast, Johannes Berg

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> If I use the --realtime option, I did run into issues where the server
> thread would busypoll 100% after e.g. a client stopped a test
> prematurely. And I had to kill it with -9.

Sort of related (but not specific to the new features), I've had to
add a restart of the iperf server to my test scripts. If I run more than
one (UDP) test against the same server instance, it would report back
the average throughput over all tests, including any idle period
in-between. Which is obviously not terribly useful... :)

-Toke

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Make-wifi-fast] less latency, more filling... for wifi
  2017-10-12  8:32       ` Toke Høiland-Jørgensen
@ 2017-10-12 18:51         ` Bob McMahon
  2017-10-13  9:28           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 17+ messages in thread
From: Bob McMahon @ 2017-10-12 18:51 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, make-wifi-fast, Johannes Berg

[-- Attachment #1: Type: text/plain, Size: 2374 bytes --]

Thanks Toke.  Let me look into this.  Is there packet loss during your
tests?   Can you share the output of the client and server per the error
scenario?

With iperf 2 there is no TCP test exchange rather UDP test information is
derived from packets in flight.  The server determines a UDP test is
finished by detecting a negative sequence number in the payload.  In
theory, this should separate UDP tests.   The server detects a new UDP
stream is by receiving a packet from a new source socket.  If  the packet
carrying the negative sequence number is lost then summing across "tests"
would be expected (even though not desired) per the current design and
implementation.  We intentionally left this as is as we didn't want to
change the startup behavior nor require the network support TCP connections
in order to run a UDP test.

Since we know UDP is unreliable, we do control both client and server over
ssh pipes, and perform summing in flight per the interval reporting.
 Operating system signals are used to kill the server.    The iperf sum and
final reports are ignored.   Unfortunately, I can't publish this package
with iperf 2 for both technical and licensing reasons.   There is some skeleton
code in Python 3.5 with asyncio
<https://sourceforge.net/p/iperf2/code/ci/master/tree/flows/flows.py> that
may be of use.   A next step here is to add support for pandas
<http://pandas.pydata.org/index.html>, and possibly some control chart
<https://en.wikipedia.org/wiki/Control_chart> techniques (both single and
multivariate
<http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc34.htm>) for both
regressions and outlier detection.

On Thu, Oct 12, 2017 at 1:32 AM, Toke Høiland-Jørgensen <toke@toke.dk>
wrote:

> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>
> > If I use the --realtime option, I did run into issues where the server
> > thread would busypoll 100% after e.g. a client stopped a test
> > prematurely. And I had to kill it with -9.
>
> Sort of related (but not specific to the new features), I've had to
> add a restart of the iperf server to my test scripts. If I run more than
> one (UDP) test against the same server instance, it would report back
> the average throughput over all tests, including any idle period
> in-between. Which is obviously not terribly useful... :)
>
> -Toke
>

[-- Attachment #2: Type: text/html, Size: 2899 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Make-wifi-fast] less latency, more filling... for wifi
  2017-10-12 18:51         ` Bob McMahon
@ 2017-10-13  9:28           ` Toke Høiland-Jørgensen
  2017-10-13 18:47             ` Bob McMahon
  0 siblings, 1 reply; 17+ messages in thread
From: Toke Høiland-Jørgensen @ 2017-10-13  9:28 UTC (permalink / raw)
  To: Bob McMahon; +Cc: Jesper Dangaard Brouer, make-wifi-fast, Johannes Berg

Bob McMahon <bob.mcmahon@broadcom.com> writes:

> Thanks Toke. Let me look into this. Is there packet loss during your
> tests? Can you share the output of the client and server per the error
> scenario?

Yeah, there's definitely packet loss.

> With iperf 2 there is no TCP test exchange rather UDP test information
> is derived from packets in flight. The server determines a UDP test is
> finished by detecting a negative sequence number in the payload. In
> theory, this should separate UDP tests. The server detects a new UDP
> stream is by receiving a packet from a new source socket. If the
> packet carrying the negative sequence number is lost then summing
> across "tests" would be expected (even though not desired) per the
> current design and implementation. We intentionally left this as is as
> we didn't want to change the startup behavior nor require the network
> support TCP connections in order to run a UDP test.

Ah, so basically, if the last packet from the client is dropped, the
server is not going to notice that the test ended and just keep
counting? That would definitely explain the behaviour I'm seeing.

So if another test starts from a different source port, the server is
still going to count the same totals? That seems kinda odd :)

> Since we know UDP is unreliable, we do control both client and server over
> ssh pipes, and perform summing in flight per the interval reporting.
>  Operating system signals are used to kill the server.    The iperf sum and
> final reports are ignored.   Unfortunately, I can't publish this package
> with iperf 2 for both technical and licensing reasons.   There is some skeleton
> code in Python 3.5 with asyncio
> <https://sourceforge.net/p/iperf2/code/ci/master/tree/flows/flows.py> that
> may be of use.   A next step here is to add support for pandas
> <http://pandas.pydata.org/index.html>, and possibly some control chart
> <https://en.wikipedia.org/wiki/Control_chart> techniques (both single and
> multivariate
> <http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc34.htm>) for both
> regressions and outlier detection.

No worries, I already have the setup scripts to handle restarting the
server, and I parse the output with Flent. Just wanted to point out this
behaviour as it was giving me some very odd results before I started
systematically restarting the server...

-Toke

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Make-wifi-fast] less latency, more filling... for wifi
  2017-10-13  9:28           ` Toke Høiland-Jørgensen
@ 2017-10-13 18:47             ` Bob McMahon
  2017-10-13 19:41               ` Bob McMahon
  0 siblings, 1 reply; 17+ messages in thread
From: Bob McMahon @ 2017-10-13 18:47 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, make-wifi-fast, Johannes Berg

[-- Attachment #1: Type: text/plain, Size: 3829 bytes --]

Hi Toke,

The other thing that will cause the server thread(s) and listener thread to
stop is -t when applied to the *server*, i.e. iperf -s -u -t 10 will cause
a 10 second timeout for the server/listener thread(s) life.   Some people
don't want the Listener to stop so when -D (daemon) is applied, the -t will
only terminate server trafffic threads.   Many people asked for this
because they wanted a way to time bound these threads, specifically over
the life of many tests.

Yeah, summing is a bit of a mess.  I've some proto code I've been playing
with but still not sure what is going to be released.

For UDP, the source port must be unique per the quintuple (ip proto/src ip/
src port/ dst ip/ dst port).  Since the UDP server is merely waiting for
packets it doesn't have an knowledge about how to group.  So it groups
based upon time, i.e. when a new traffic shows up it's put an existing
active group for summing.

I'm not sure a good way to fix this.  I think the client would have to
modify the payload, and  per a -P tell the server the udp src ports that
belong in the same group.  Then the server could assign groups based upon a
key in the payload.

Thoughts and comments welcome,
Bob

On Fri, Oct 13, 2017 at 2:28 AM, Toke Høiland-Jørgensen <toke@toke.dk>
wrote:

> Bob McMahon <bob.mcmahon@broadcom.com> writes:
>
> > Thanks Toke. Let me look into this. Is there packet loss during your
> > tests? Can you share the output of the client and server per the error
> > scenario?
>
> Yeah, there's definitely packet loss.
>
> > With iperf 2 there is no TCP test exchange rather UDP test information
> > is derived from packets in flight. The server determines a UDP test is
> > finished by detecting a negative sequence number in the payload. In
> > theory, this should separate UDP tests. The server detects a new UDP
> > stream is by receiving a packet from a new source socket. If the
> > packet carrying the negative sequence number is lost then summing
> > across "tests" would be expected (even though not desired) per the
> > current design and implementation. We intentionally left this as is as
> > we didn't want to change the startup behavior nor require the network
> > support TCP connections in order to run a UDP test.
>
> Ah, so basically, if the last packet from the client is dropped, the
> server is not going to notice that the test ended and just keep
> counting? That would definitely explain the behaviour I'm seeing.
>
> So if another test starts from a different source port, the server is
> still going to count the same totals? That seems kinda odd :)
>
> > Since we know UDP is unreliable, we do control both client and server
> over
> > ssh pipes, and perform summing in flight per the interval reporting.
> >  Operating system signals are used to kill the server.    The iperf sum
> and
> > final reports are ignored.   Unfortunately, I can't publish this package
> > with iperf 2 for both technical and licensing reasons.   There is some
> skeleton
> > code in Python 3.5 with asyncio
> > <https://sourceforge.net/p/iperf2/code/ci/master/tree/flows/flows.py>
> that
> > may be of use.   A next step here is to add support for pandas
> > <http://pandas.pydata.org/index.html>, and possibly some control chart
> > <https://en.wikipedia.org/wiki/Control_chart> techniques (both single
> and
> > multivariate
> > <http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc34.htm>) for
> both
> > regressions and outlier detection.
>
> No worries, I already have the setup scripts to handle restarting the
> server, and I parse the output with Flent. Just wanted to point out this
> behaviour as it was giving me some very odd results before I started
> systematically restarting the server...
>
> -Toke
>

[-- Attachment #2: Type: text/html, Size: 5001 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Make-wifi-fast] less latency, more filling... for wifi
  2017-10-13 18:47             ` Bob McMahon
@ 2017-10-13 19:41               ` Bob McMahon
  2017-10-14  1:46                 ` Bob McMahon
  0 siblings, 1 reply; 17+ messages in thread
From: Bob McMahon @ 2017-10-13 19:41 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, make-wifi-fast, Johannes Berg

[-- Attachment #1: Type: text/plain, Size: 4138 bytes --]

PS.  Thanks for writing flent and making it available.  I'm a novice
w/flent but do plan to learn it.

Bob

On Fri, Oct 13, 2017 at 11:47 AM, Bob McMahon <bob.mcmahon@broadcom.com>
wrote:

> Hi Toke,
>
> The other thing that will cause the server thread(s) and listener thread
> to stop is -t when applied to the *server*, i.e. iperf -s -u -t 10 will
> cause a 10 second timeout for the server/listener thread(s) life.   Some
> people don't want the Listener to stop so when -D (daemon) is applied, the
> -t will only terminate server trafffic threads.   Many people asked for
> this because they wanted a way to time bound these threads, specifically
> over the life of many tests.
>
> Yeah, summing is a bit of a mess.  I've some proto code I've been playing
> with but still not sure what is going to be released.
>
> For UDP, the source port must be unique per the quintuple (ip proto/src
> ip/ src port/ dst ip/ dst port).  Since the UDP server is merely waiting
> for packets it doesn't have an knowledge about how to group.  So it groups
> based upon time, i.e. when a new traffic shows up it's put an existing
> active group for summing.
>
> I'm not sure a good way to fix this.  I think the client would have to
> modify the payload, and  per a -P tell the server the udp src ports that
> belong in the same group.  Then the server could assign groups based upon a
> key in the payload.
>
> Thoughts and comments welcome,
> Bob
>
> On Fri, Oct 13, 2017 at 2:28 AM, Toke Høiland-Jørgensen <toke@toke.dk>
> wrote:
>
>> Bob McMahon <bob.mcmahon@broadcom.com> writes:
>>
>> > Thanks Toke. Let me look into this. Is there packet loss during your
>> > tests? Can you share the output of the client and server per the error
>> > scenario?
>>
>> Yeah, there's definitely packet loss.
>>
>> > With iperf 2 there is no TCP test exchange rather UDP test information
>> > is derived from packets in flight. The server determines a UDP test is
>> > finished by detecting a negative sequence number in the payload. In
>> > theory, this should separate UDP tests. The server detects a new UDP
>> > stream is by receiving a packet from a new source socket. If the
>> > packet carrying the negative sequence number is lost then summing
>> > across "tests" would be expected (even though not desired) per the
>> > current design and implementation. We intentionally left this as is as
>> > we didn't want to change the startup behavior nor require the network
>> > support TCP connections in order to run a UDP test.
>>
>> Ah, so basically, if the last packet from the client is dropped, the
>> server is not going to notice that the test ended and just keep
>> counting? That would definitely explain the behaviour I'm seeing.
>>
>> So if another test starts from a different source port, the server is
>> still going to count the same totals? That seems kinda odd :)
>>
>> > Since we know UDP is unreliable, we do control both client and server
>> over
>> > ssh pipes, and perform summing in flight per the interval reporting.
>> >  Operating system signals are used to kill the server.    The iperf sum
>> and
>> > final reports are ignored.   Unfortunately, I can't publish this package
>> > with iperf 2 for both technical and licensing reasons.   There is some
>> skeleton
>> > code in Python 3.5 with asyncio
>> > <https://sourceforge.net/p/iperf2/code/ci/master/tree/flows/flows.py>
>> that
>> > may be of use.   A next step here is to add support for pandas
>> > <http://pandas.pydata.org/index.html>, and possibly some control chart
>> > <https://en.wikipedia.org/wiki/Control_chart> techniques (both single
>> and
>> > multivariate
>> > <http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc34.htm>) for
>> both
>> > regressions and outlier detection.
>>
>> No worries, I already have the setup scripts to handle restarting the
>> server, and I parse the output with Flent. Just wanted to point out this
>> behaviour as it was giving me some very odd results before I started
>> systematically restarting the server...
>>
>> -Toke
>>
>
>

[-- Attachment #2: Type: text/html, Size: 5551 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Make-wifi-fast] less latency, more filling... for wifi
  2017-10-13 19:41               ` Bob McMahon
@ 2017-10-14  1:46                 ` Bob McMahon
  0 siblings, 0 replies; 17+ messages in thread
From: Bob McMahon @ 2017-10-14  1:46 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, make-wifi-fast, Johannes Berg

[-- Attachment #1: Type: text/plain, Size: 11862 bytes --]

FYI, here's an example output using two brix PCs over their wired ethernet
connected by a Cisco SG300 switch.   Note: the ptpd stats (not shown)
displays the clock corrections and suggests them to be within 10
microseconds.

[rjmcmahon@rjm-fedora etc]$ iperf -c 192.168.100.33 -u  -e -t 2 -b 2kpps
------------------------------------------------------------
Client connecting to 192.168.100.33, UDP port 5001 with pid 29952
Sending 1470 byte datagrams, IPG target: 500.00 us (kalman adjust)
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 192.168.100.67 port 50062 connected with 192.168.100.33 port
5001
[ ID] Interval       Transfer     Bandwidth      PPS
[  3] 0.00-2.00 sec  5.61 MBytes  23.5 Mbits/sec 1999 pps
[  3] Sent 4001 datagrams
[  3] Server Report:
[  3] 0.00-2.00 sec  5.61 MBytes  23.5 Mbits/sec   0.012 ms    0/ 4001 (0%)
-/-/-/- ms 2000 pps

[rjmcmahon@hera ~]$ iperf -s -u -e -i 0.1
------------------------------------------------------------
Server listening on UDP port 5001 with pid 5178
Receiving 1470 byte datagrams
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 192.168.100.33 port 5001 connected with 192.168.100.67 port
57325
[ ID] Interval       Transfer     Bandwidth        Jitter   Lost/Total
Latency avg/min/max/stdev PPS
[  3] 0.00-0.10 sec   289 KBytes  23.6 Mbits/sec   0.013 ms    0/  201
(0%)  0.142/ 0.018/ 0.192/ 0.025 ms 2005 pps
[  3] 0.10-0.20 sec   287 KBytes  23.5 Mbits/sec   0.020 ms    0/  200
(0%)  0.157/ 0.101/ 0.207/ 0.015 ms 1999 pps
[  3] 0.20-0.30 sec   287 KBytes  23.5 Mbits/sec   0.014 ms    0/  200
(0%)  0.155/ 0.071/ 0.212/ 0.018 ms 2002 pps
[  3] 0.30-0.40 sec   287 KBytes  23.5 Mbits/sec   0.018 ms    0/  200
(0%)  0.146/-0.007/ 0.187/ 0.018 ms 1999 pps
[  3] 0.40-0.50 sec   287 KBytes  23.5 Mbits/sec   0.021 ms    0/  200
(0%)  0.151/ 0.021/ 0.208/ 0.018 ms 2000 pps
[  3] 0.50-0.60 sec   287 KBytes  23.5 Mbits/sec   0.016 ms    0/  200
(0%)  0.148/ 0.043/ 0.192/ 0.018 ms 2000 pps
[  3] 0.60-0.70 sec   287 KBytes  23.5 Mbits/sec   0.019 ms    0/  200
(0%)  0.152/ 0.041/ 0.199/ 0.018 ms 2001 pps
[  3] 0.70-0.80 sec   287 KBytes  23.5 Mbits/sec   0.016 ms    0/  200
(0%)  0.144/ 0.071/ 0.206/ 0.017 ms 2001 pps
[  3] 0.80-0.90 sec   287 KBytes  23.5 Mbits/sec   0.015 ms    0/  200
(0%)  0.140/ 0.111/ 0.186/ 0.014 ms 1999 pps
[  3] 0.90-1.00 sec   287 KBytes  23.5 Mbits/sec   0.022 ms    0/  200
(0%)  0.154/ 0.111/ 0.222/ 0.019 ms 2000 pps
[  3] 1.00-1.10 sec   287 KBytes  23.5 Mbits/sec   0.014 ms    0/  200
(0%)  0.152/ 0.036/ 0.197/ 0.017 ms 2000 pps
[  3] 1.10-1.20 sec   287 KBytes  23.5 Mbits/sec   0.015 ms    0/  200
(0%)  0.153/-0.007/ 0.186/ 0.020 ms 2001 pps
[  3] 1.20-1.30 sec   287 KBytes  23.5 Mbits/sec   0.013 ms    0/  200
(0%)  0.149/ 0.035/ 0.207/ 0.018 ms 2000 pps
[  3] 1.30-1.40 sec   287 KBytes  23.5 Mbits/sec   0.014 ms    0/  200
(0%)  0.160/ 0.116/ 0.233/ 0.018 ms 2000 pps
[  3] 1.40-1.50 sec   287 KBytes  23.5 Mbits/sec   0.014 ms    0/  200
(0%)  0.159/ 0.122/ 0.207/ 0.015 ms 2000 pps
[  3] 1.50-1.60 sec   287 KBytes  23.5 Mbits/sec   0.016 ms    0/  200
(0%)  0.158/ 0.066/ 0.201/ 0.015 ms 1999 pps
[  3] 1.60-1.70 sec   287 KBytes  23.5 Mbits/sec   0.017 ms    0/  200
(0%)  0.162/ 0.076/ 0.203/ 0.016 ms 2000 pps
[  3] 1.70-1.80 sec   287 KBytes  23.5 Mbits/sec   0.014 ms    0/  200
(0%)  0.154/ 0.073/ 0.195/ 0.016 ms 2002 pps
[  3] 1.80-1.90 sec   287 KBytes  23.5 Mbits/sec   0.015 ms    0/  200
(0%)  0.154/ 0.113/ 0.213/ 0.017 ms 1999 pps
[  3] 1.90-2.00 sec   287 KBytes  23.5 Mbits/sec   0.019 ms    0/  200
(0%)  0.152/ 0.124/ 0.208/ 0.016 ms 1999 pps
[  3] 0.00-2.00 sec  5.61 MBytes  23.5 Mbits/sec   0.019 ms    0/ 4001
(0%)  0.151/-0.007/ 0.233/ 0.018 ms 2000 pps


Here's a full line rate (1Gbs ethernet) run

[rjmcmahon@rjm-fedora etc]$ iperf -c 192.168.100.33 -u  -e -t 2 -b 200kpss
-i 1 -w 2M
------------------------------------------------------------
Client connecting to 192.168.100.33, UDP port 5001 with pid 30626
Sending 1470 byte datagrams, IPG target: 5.00 us (kalman adjust)
UDP buffer size:  416 KByte (WARNING: requested 2.00 MByte)
------------------------------------------------------------
[  3] local 192.168.100.67 port 57349 connected with 192.168.100.33 port
5001
[ ID] Interval       Transfer     Bandwidth      PPS
[  3] 0.00-1.00 sec   114 MBytes   958 Mbits/sec 81421 pps
[  3] 1.00-2.00 sec   114 MBytes   958 Mbits/sec 81380 pps
[  3] 0.00-2.00 sec   228 MBytes   957 Mbits/sec 81399 pps
[  3] Sent 162874 datagrams
[  3] Server Report:
[  3] 0.00-2.00 sec   228 MBytes   957 Mbits/sec   0.084 ms    0/162874
(0%)  1.641/ 0.253/ 2.466/ 0.114 ms 81383 pps


[  3] local 192.168.100.33 port 5001 connected with 192.168.100.67 port
57349
[  3] 0.00-0.10 sec  11.4 MBytes   960 Mbits/sec   0.016 ms    0/ 8166
(0%)  1.615/ 0.253/ 2.309/ 0.355 ms 81606 pps
[  3] 0.10-0.20 sec  11.4 MBytes   956 Mbits/sec   0.016 ms    0/ 8133
(0%)  1.657/ 0.936/ 2.325/ 0.348 ms 81350 pps
[  3] 0.20-0.30 sec  11.4 MBytes   957 Mbits/sec   0.021 ms    0/ 8139
(0%)  1.657/ 0.950/ 2.400/ 0.348 ms 81383 pps
[  3] 0.30-0.40 sec  11.4 MBytes   958 Mbits/sec   0.016 ms    0/ 8144
(0%)  1.652/ 0.953/ 2.357/ 0.342 ms 81380 pps
[  3] 0.40-0.50 sec  11.4 MBytes   956 Mbits/sec   0.069 ms    0/ 8131
(0%)  1.644/ 0.947/ 2.368/ 0.341 ms 81384 pps
[  3] 0.50-0.60 sec  11.4 MBytes   957 Mbits/sec   0.072 ms    0/ 8138
(0%)  1.649/ 0.949/ 2.381/ 0.337 ms 81372 pps
[  3] 0.60-0.70 sec  11.4 MBytes   957 Mbits/sec   0.025 ms    0/ 8139
(0%)  1.639/ 0.952/ 2.357/ 0.342 ms 81383 pps
[  3] 0.70-0.80 sec  11.4 MBytes   957 Mbits/sec   0.014 ms    0/ 8135
(0%)  1.643/ 0.944/ 2.368/ 0.343 ms 81390 pps
[  3] 0.80-0.90 sec  11.4 MBytes   957 Mbits/sec   0.016 ms    0/ 8142
(0%)  1.639/ 0.946/ 2.361/ 0.341 ms 81366 pps
[  3] 0.90-1.00 sec  11.4 MBytes   957 Mbits/sec   0.015 ms    0/ 8142
(0%)  1.635/ 0.932/ 2.378/ 0.342 ms 81387 pps
[  3] 1.00-1.10 sec  11.4 MBytes   957 Mbits/sec   0.015 ms    0/ 8138
(0%)  1.633/ 0.934/ 2.359/ 0.341 ms 81373 pps
[  3] 1.10-1.20 sec  11.4 MBytes   957 Mbits/sec   0.010 ms    0/ 8135
(0%)  1.636/ 0.947/ 2.361/ 0.342 ms 81444 pps
[  3] 1.20-1.30 sec  11.4 MBytes   957 Mbits/sec   0.091 ms    0/ 8140
(0%)  1.624/ 0.908/ 2.363/ 0.354 ms 81400 pps
[  3] 1.30-1.40 sec  11.4 MBytes   956 Mbits/sec   0.016 ms    0/ 8133
(0%)  1.616/ 0.917/ 2.325/ 0.345 ms 81296 pps
[  3] 1.40-1.50 sec  11.4 MBytes   957 Mbits/sec   0.012 ms    0/ 8138
(0%)  1.626/ 0.918/ 2.361/ 0.346 ms 81414 pps
[  3] 1.50-1.60 sec  11.4 MBytes   957 Mbits/sec   0.015 ms    0/ 8136
(0%)  1.626/ 0.934/ 2.352/ 0.339 ms 81339 pps
[  3] 1.60-1.70 sec  11.4 MBytes   957 Mbits/sec   0.015 ms    0/ 8141
(0%)  1.633/ 0.930/ 2.351/ 0.341 ms 81376 pps
[  3] 1.70-1.80 sec  11.4 MBytes   956 Mbits/sec   0.017 ms    0/ 8133
(0%)  1.627/ 0.929/ 2.354/ 0.339 ms 81377 pps
[  3] 1.80-1.90 sec  11.4 MBytes   957 Mbits/sec   0.088 ms    0/ 8139
(0%)  1.659/ 0.895/ 2.396/ 0.343 ms 81330 pps
[  3] 1.90-2.00 sec  11.4 MBytes   956 Mbits/sec   0.013 ms    0/ 8128
(0%)  1.709/ 0.996/ 2.466/ 0.342 ms 81348 pps
[  3] 0.00-2.00 sec   228 MBytes   957 Mbits/sec   0.085 ms    0/162874
(0%)  1.641/ 0.253/ 2.466/ 0.344 ms 81383 pps

I'll be testing with 802.11ax chips soon but probably can't share those
numbers.  Sorry about that.

Bob



On Fri, Oct 13, 2017 at 12:41 PM, Bob McMahon <bob.mcmahon@broadcom.com>
wrote:

> PS.  Thanks for writing flent and making it available.  I'm a novice
> w/flent but do plan to learn it.
>
> Bob
>
> On Fri, Oct 13, 2017 at 11:47 AM, Bob McMahon <bob.mcmahon@broadcom.com>
> wrote:
>
>> Hi Toke,
>>
>> The other thing that will cause the server thread(s) and listener thread
>> to stop is -t when applied to the *server*, i.e. iperf -s -u -t 10 will
>> cause a 10 second timeout for the server/listener thread(s) life.   Some
>> people don't want the Listener to stop so when -D (daemon) is applied, the
>> -t will only terminate server trafffic threads.   Many people asked for
>> this because they wanted a way to time bound these threads, specifically
>> over the life of many tests.
>>
>> Yeah, summing is a bit of a mess.  I've some proto code I've been playing
>> with but still not sure what is going to be released.
>>
>> For UDP, the source port must be unique per the quintuple (ip proto/src
>> ip/ src port/ dst ip/ dst port).  Since the UDP server is merely waiting
>> for packets it doesn't have an knowledge about how to group.  So it groups
>> based upon time, i.e. when a new traffic shows up it's put an existing
>> active group for summing.
>>
>> I'm not sure a good way to fix this.  I think the client would have to
>> modify the payload, and  per a -P tell the server the udp src ports that
>> belong in the same group.  Then the server could assign groups based upon a
>> key in the payload.
>>
>> Thoughts and comments welcome,
>> Bob
>>
>> On Fri, Oct 13, 2017 at 2:28 AM, Toke Høiland-Jørgensen <toke@toke.dk>
>> wrote:
>>
>>> Bob McMahon <bob.mcmahon@broadcom.com> writes:
>>>
>>> > Thanks Toke. Let me look into this. Is there packet loss during your
>>> > tests? Can you share the output of the client and server per the error
>>> > scenario?
>>>
>>> Yeah, there's definitely packet loss.
>>>
>>> > With iperf 2 there is no TCP test exchange rather UDP test information
>>> > is derived from packets in flight. The server determines a UDP test is
>>> > finished by detecting a negative sequence number in the payload. In
>>> > theory, this should separate UDP tests. The server detects a new UDP
>>> > stream is by receiving a packet from a new source socket. If the
>>> > packet carrying the negative sequence number is lost then summing
>>> > across "tests" would be expected (even though not desired) per the
>>> > current design and implementation. We intentionally left this as is as
>>> > we didn't want to change the startup behavior nor require the network
>>> > support TCP connections in order to run a UDP test.
>>>
>>> Ah, so basically, if the last packet from the client is dropped, the
>>> server is not going to notice that the test ended and just keep
>>> counting? That would definitely explain the behaviour I'm seeing.
>>>
>>> So if another test starts from a different source port, the server is
>>> still going to count the same totals? That seems kinda odd :)
>>>
>>> > Since we know UDP is unreliable, we do control both client and server
>>> over
>>> > ssh pipes, and perform summing in flight per the interval reporting.
>>> >  Operating system signals are used to kill the server.    The iperf
>>> sum and
>>> > final reports are ignored.   Unfortunately, I can't publish this
>>> package
>>> > with iperf 2 for both technical and licensing reasons.   There is some
>>> skeleton
>>> > code in Python 3.5 with asyncio
>>> > <https://sourceforge.net/p/iperf2/code/ci/master/tree/flows/flows.py>
>>> that
>>> > may be of use.   A next step here is to add support for pandas
>>> > <http://pandas.pydata.org/index.html>, and possibly some control chart
>>> > <https://en.wikipedia.org/wiki/Control_chart> techniques (both single
>>> and
>>> > multivariate
>>> > <http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc34.htm>) for
>>> both
>>> > regressions and outlier detection.
>>>
>>> No worries, I already have the setup scripts to handle restarting the
>>> server, and I parse the output with Flent. Just wanted to point out this
>>> behaviour as it was giving me some very odd results before I started
>>> systematically restarting the server...
>>>
>>> -Toke
>>>
>>
>>
>

[-- Attachment #2: Type: text/html, Size: 14824 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Make-wifi-fast] less latency, more filling... for wifi
  2017-10-09 21:44     ` Simon Barber
  2017-10-09 22:02       ` Bob McMahon
@ 2017-10-16 21:26       ` Simon Barber
  2017-10-17  4:53         ` Bob McMahon
  1 sibling, 1 reply; 17+ messages in thread
From: Simon Barber @ 2017-10-16 21:26 UTC (permalink / raw)
  To: Bob McMahon; +Cc: make-wifi-fast, Johannes Berg

[-- Attachment #1: Type: text/plain, Size: 7643 bytes --]

What I mean is for the tool to directly measure minimum round trip, and then report one way delay above this separately in each direction. This can be done without external time synchronization.

Simon

> On Oct 9, 2017, at 2:44 PM, Simon Barber <simon@superduper.net> wrote:
> 
> Very nice - I’m using iperf3.2 and always have to figure packets per second by combining packet size and bandwidth. This will be much easier. Also direct reporting of one way latency variance above minimum round trip would be very useful.
> 
> Simon
> 
>> On Oct 9, 2017, at 2:04 PM, Bob McMahon <bob.mcmahon@broadcom.com <mailto:bob.mcmahon@broadcom.com>> wrote:
>> 
>> Hi,
>> 
>> Not sure if this is helpful but we've added end/end latency measurements for UDP traffic in iperf 2.0.10 <https://sourceforge.net/projects/iperf2/>.   It does require the clocks to be synched.  I use a spectracom tsync pcie card with either an oven controlled oscillator or a GPS disciplined one, then use precision time protocol to distribute the clock over ip multicast.  For Linux, the traffic threads are set to realtime scheduling to minimize latency adds per thread scheduling..
>> 
>> I'm also in the process of implementing a very simple isochronous option where the iperf client (tx) accepts a frames per second commmand line value (e.g. 60) as well as a log normal distribution <https://sourceforge.net/p/iperf2/code/ci/master/tree/src/pdfs.c> for the input to somewhat simulate variable bit rates.  On the iperf receiver considering implementing an underflow/overflow counter per the expected frames per second.
>> 
>> Latency does seem to be a significant metric.  Also is power consumption.
>> 
>> Comments welcome.
>> 
>> Bob
>> 
>> On Mon, Oct 9, 2017 at 1:41 PM, <dpreed@reed.com <mailto:dpreed@reed.com>> wrote:
>> It's worth setting a stretch latency goal that is in principle achievable.
>>  
>> I get the sense that the wireless group obsesses over maximum channel utilization rather than excellent latency.  This is where it's important to put latency as a primary goal, and utilization as the secondary goal, rather than vice versa.
>>  
>> It's easy to get at this by observing that the minimum latency on the shared channel is achieved by round-robin scheduling of packets that are of sufficient size that per packet overhead doesn't dominate.
>>  
>> So only aggregate when there are few contenders for the channel, or the packets are quite small compared to the per-packet overhead. When there are more contenders, still aggregate small packets, but only those that are actually waiting. But large packets shouldn't be aggregated.
>>  
>> Multicast should be avoided by higher level protocols for the most part, and the latency of multicast should be a non-issue. In wireless, it's kind of a dumb idea anyway, given that stations have widely varying propagation characteristics. Do just enough to support DHCP and so forth.
>>  
>> It's so much fun for tha hardware designers to throw in stuff that only helps in marketing benchmarks (like getting a few percent on throughput in lab conditions that never happen in the field) that it is tempting for OS driver writers to use those features (like deep queues and offload processing bells and whistles). But the real issue to be solved is that turn-taking "bloat" that comes from too much attempt to aggregate, to handle the "sole transmitter to dedicated receiver case" etc.
>>  
>> I use 10 GigE in my house. I don't use it because I want to do 10 Gig File Transfers all day and measure them. I use it because (properly managed) it gives me *low latency*. That low latency is what matters, not throughput. My average load, if spread out across 24 hours, could be handled by 802.11b for the entire house.
>>  
>> We are soon going to have 802.11ax in the home. That's approximately 10 Gb/sec, but wireless. No TV streaming can fill it. It's not for continuous isochronous traffic at all.
>>  
>> What it is for is *low latency*. So if the adapters and the drivers won't give me that low latency, what good is 10 Gb/sec at all. This is true for 802.11ac, as well.
>>  
>> We aren't building Dragsters fueled with nitro, to run down 1/4 mile of track but unable to steer.
>>  
>> Instead, we want to be able to connect musical instruments in an electronic symphony, where timing is everything.
>>  
>> 
>> 
>> On Monday, October 9, 2017 4:13pm, "Dave Taht" <dave.taht@gmail.com <mailto:dave.taht@gmail.com>> said:
>> 
>> > There were five ideas I'd wanted to pursue at some point. I''m not
>> > presently on linux-wireless, nor do I have time to pay attention right
>> > now - but I'm enjoying that thread passively.
>> > 
>> > To get those ideas "out there" again:
>> > 
>> > * adding a fixed length fq'd queue for multicast.
>> > 
>> > * Reducing retransmits at low rates
>> > 
>> > See the recent paper:
>> > 
>> > "Resolving Bufferbloat in TCP Communication over IEEE 802.11 n WLAN by
>> > Reducing MAC Retransmission Limit at Low Data Rate" (I'd paste a link
>> > but for some reason that doesn't work well)
>> > 
>> > Even with their simple bi-modal model it worked pretty well.
>> > 
>> > It also reduces contention with "bad" stations more automagically.
>> > 
>> > * Less buffering at the driver.
>> > 
>> > Presently (ath9k) there are two-three aggregates stacked up at the driver.
>> > 
>> > With a good estimate for how long it will take to service one, forming
>> > another within that deadline seems feasible, so you only need to have
>> > one in the hardware itself.
>> > 
>> > Simple example: you have data in the hardware projected to take a
>> > minimum of 4ms to transmit. Don't form a new aggregate and submit it
>> > to the hardware for 3.5ms.
>> > 
>> > I know full well that a "good" estimate is hard, and things like
>> > mu-mimo complicate things. Still, I'd like to get below 20ms of
>> > latency within the driver, and this is one way to get there.
>> > 
>> > * Reducing the size of a txop under contention
>> > 
>> > if you have 5 stations getting blasted away at 5ms each, and one that
>> > only wants 1ms worth of traffic, "soon", temporarily reducing the size
>> > of the txop for everybody so you can service more stations faster,
>> > seems useful.
>> > 
>> > * Merging acs when sane to do so
>> > 
>> > sane aggregation in general works better than prioritizing does, as
>> > shown in ending the anomaly.
>> > 
>> > --
>> > 
>> > Dave Täht
>> > CEO, TekLibre, LLC
>> > http://www.teklibre.com <http://www.teklibre.com/>
>> > Tel: 1-669-226-2619 <tel:(669)%20226-2619>
>> > _______________________________________________
>> > Make-wifi-fast mailing list
>> > Make-wifi-fast@lists.bufferbloat.net <mailto:Make-wifi-fast@lists.bufferbloat.net>
>> > https://lists.bufferbloat.net/listinfo/make-wifi-fast <https://lists.bufferbloat.net/listinfo/make-wifi-fast>
>> _______________________________________________
>> Make-wifi-fast mailing list
>> Make-wifi-fast@lists.bufferbloat.net <mailto:Make-wifi-fast@lists.bufferbloat.net>
>> https://lists.bufferbloat.net/listinfo/make-wifi-fast <https://lists.bufferbloat.net/listinfo/make-wifi-fast>
>> 
>> _______________________________________________
>> Make-wifi-fast mailing list
>> Make-wifi-fast@lists.bufferbloat.net <mailto:Make-wifi-fast@lists.bufferbloat.net>
>> https://lists.bufferbloat.net/listinfo/make-wifi-fast
> 
> _______________________________________________
> Make-wifi-fast mailing list
> Make-wifi-fast@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast


[-- Attachment #2: Type: text/html, Size: 12557 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Make-wifi-fast] less latency, more filling... for wifi
  2017-10-16 21:26       ` Simon Barber
@ 2017-10-17  4:53         ` Bob McMahon
  0 siblings, 0 replies; 17+ messages in thread
From: Bob McMahon @ 2017-10-17  4:53 UTC (permalink / raw)
  To: Simon Barber; +Cc: make-wifi-fast, Johannes Berg

[-- Attachment #1: Type: text/plain, Size: 7726 bytes --]

I'm confused.  Are you referring to TCP's RTT or some other round trip?  If
something else, what?   How is one way latency measured without clock
synchronization and a common clock domain?

Thanks,
Bob

On Mon, Oct 16, 2017 at 2:26 PM, Simon Barber <simon@superduper.net> wrote:

> What I mean is for the tool to directly measure minimum round trip, and
> then report one way delay above this separately in each direction. This can
> be done without external time synchronization.
>
> Simon
>
> On Oct 9, 2017, at 2:44 PM, Simon Barber <simon@superduper.net> wrote:
>
> Very nice - I’m using iperf3.2 and always have to figure packets per
> second by combining packet size and bandwidth. This will be much easier.
> Also direct reporting of one way latency variance above minimum round trip
> would be very useful.
>
> Simon
>
> On Oct 9, 2017, at 2:04 PM, Bob McMahon <bob.mcmahon@broadcom.com> wrote:
>
> Hi,
>
> Not sure if this is helpful but we've added end/end latency measurements
> for UDP traffic in iperf 2.0.10 <https://sourceforge.net/projects/iperf2/>.
>   It does require the clocks to be synched.  I use a spectracom tsync pcie
> card with either an oven controlled oscillator or a GPS disciplined one,
> then use precision time protocol to distribute the clock over ip
> multicast.  For Linux, the traffic threads are set to realtime scheduling
> to minimize latency adds per thread scheduling..
>
> I'm also in the process of implementing a very simple isochronous option
> where the iperf client (tx) accepts a frames per second commmand line value
> (e.g. 60) as well as a log normal distribution
> <https://sourceforge.net/p/iperf2/code/ci/master/tree/src/pdfs.c> for the
> input to somewhat simulate variable bit rates.  On the iperf receiver
> considering implementing an underflow/overflow counter per the expected
> frames per second.
>
> Latency does seem to be a significant metric.  Also is power consumption.
>
> Comments welcome.
>
> Bob
>
> On Mon, Oct 9, 2017 at 1:41 PM, <dpreed@reed.com> wrote:
>
>> It's worth setting a stretch latency goal that is in principle achievable.
>>
>>
>> I get the sense that the wireless group obsesses over maximum channel
>> utilization rather than excellent latency.  This is where it's important to
>> put latency as a primary goal, and utilization as the secondary goal,
>> rather than vice versa.
>>
>>
>> It's easy to get at this by observing that the minimum latency on the
>> shared channel is achieved by round-robin scheduling of packets that are of
>> sufficient size that per packet overhead doesn't dominate.
>>
>>
>> So only aggregate when there are few contenders for the channel, or the
>> packets are quite small compared to the per-packet overhead. When there are
>> more contenders, still aggregate small packets, but only those that are
>> actually waiting. But large packets shouldn't be aggregated.
>>
>>
>> Multicast should be avoided by higher level protocols for the most part,
>> and the latency of multicast should be a non-issue. In wireless, it's kind
>> of a dumb idea anyway, given that stations have widely varying propagation
>> characteristics. Do just enough to support DHCP and so forth.
>>
>>
>> It's so much fun for tha hardware designers to throw in stuff that only
>> helps in marketing benchmarks (like getting a few percent on throughput in
>> lab conditions that never happen in the field) that it is tempting for OS
>> driver writers to use those features (like deep queues and offload
>> processing bells and whistles). But the real issue to be solved is that
>> turn-taking "bloat" that comes from too much attempt to aggregate, to
>> handle the "sole transmitter to dedicated receiver case" etc.
>>
>>
>> I use 10 GigE in my house. I don't use it because I want to do 10 Gig
>> File Transfers all day and measure them. I use it because (properly
>> managed) it gives me *low latency*. That low latency is what matters, not
>> throughput. My average load, if spread out across 24 hours, could be
>> handled by 802.11b for the entire house.
>>
>>
>> We are soon going to have 802.11ax in the home. That's approximately 10
>> Gb/sec, but wireless. No TV streaming can fill it. It's not for continuous
>> isochronous traffic at all.
>>
>>
>> What it is for is *low latency*. So if the adapters and the drivers won't
>> give me that low latency, what good is 10 Gb/sec at all. This is true for
>> 802.11ac, as well.
>>
>>
>> We aren't building Dragsters fueled with nitro, to run down 1/4 mile of
>> track but unable to steer.
>>
>>
>> Instead, we want to be able to connect musical instruments in an
>> electronic symphony, where timing is everything.
>>
>>
>>
>>
>> On Monday, October 9, 2017 4:13pm, "Dave Taht" <dave.taht@gmail.com>
>> said:
>>
>> > There were five ideas I'd wanted to pursue at some point. I''m not
>> > presently on linux-wireless, nor do I have time to pay attention right
>> > now - but I'm enjoying that thread passively.
>> >
>> > To get those ideas "out there" again:
>> >
>> > * adding a fixed length fq'd queue for multicast.
>> >
>> > * Reducing retransmits at low rates
>> >
>> > See the recent paper:
>> >
>> > "Resolving Bufferbloat in TCP Communication over IEEE 802.11 n WLAN by
>> > Reducing MAC Retransmission Limit at Low Data Rate" (I'd paste a link
>> > but for some reason that doesn't work well)
>> >
>> > Even with their simple bi-modal model it worked pretty well.
>> >
>> > It also reduces contention with "bad" stations more automagically.
>> >
>> > * Less buffering at the driver.
>> >
>> > Presently (ath9k) there are two-three aggregates stacked up at the
>> driver.
>> >
>> > With a good estimate for how long it will take to service one, forming
>> > another within that deadline seems feasible, so you only need to have
>> > one in the hardware itself.
>> >
>> > Simple example: you have data in the hardware projected to take a
>> > minimum of 4ms to transmit. Don't form a new aggregate and submit it
>> > to the hardware for 3.5ms.
>> >
>> > I know full well that a "good" estimate is hard, and things like
>> > mu-mimo complicate things. Still, I'd like to get below 20ms of
>> > latency within the driver, and this is one way to get there.
>> >
>> > * Reducing the size of a txop under contention
>> >
>> > if you have 5 stations getting blasted away at 5ms each, and one that
>> > only wants 1ms worth of traffic, "soon", temporarily reducing the size
>> > of the txop for everybody so you can service more stations faster,
>> > seems useful.
>> >
>> > * Merging acs when sane to do so
>> >
>> > sane aggregation in general works better than prioritizing does, as
>> > shown in ending the anomaly.
>> >
>> > --
>> >
>> > Dave Täht
>> > CEO, TekLibre, LLC
>> > http://www.teklibre.com
>> > Tel: 1-669-226-2619 <(669)%20226-2619>
>> > _______________________________________________
>> > Make-wifi-fast mailing list
>> > Make-wifi-fast@lists.bufferbloat.net
>> > https://lists.bufferbloat.net/listinfo/make-wifi-fast
>>
>> _______________________________________________
>> Make-wifi-fast mailing list
>> Make-wifi-fast@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/make-wifi-fast
>>
>
> _______________________________________________
> Make-wifi-fast mailing list
> Make-wifi-fast@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast
>
>
> _______________________________________________
> Make-wifi-fast mailing list
> Make-wifi-fast@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast
>
>
>

[-- Attachment #2: Type: text/html, Size: 11853 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Make-wifi-fast] less latency, more filling... for wifi
  2017-10-16 18:28 ` Pete Heist
@ 2017-10-16 19:56   ` Dave Taht
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Taht @ 2017-10-16 19:56 UTC (permalink / raw)
  To: Pete Heist; +Cc: make-wifi-fast

Pete Heist <peteheist@gmail.com> writes:

>> On Oct 9, 2017, at 1:13 PM, Dave Taht <dave.taht@gmail.com> wrote:
>> 
>> * Less buffering at the driver.
>> 
>> Presently (ath9k) there are two-three aggregates stacked up at the driver.
>> 
>> With a good estimate for how long it will take to service one, forming
>> another within that deadline seems feasible, so you only need to have
>> one in the hardware itself.
>> 
>> Simple example: you have data in the hardware projected to take a
>> minimum of 4ms to transmit. Don't form a new aggregate and submit it
>> to the hardware for 3.5ms.
>> 
>> I know full well that a "good" estimate is hard, and things like
>> mu-mimo complicate things. Still, I'd like to get below 20ms of
>> latency within the driver, and this is one way to get there.
>
> For what it’s worth, I’d love to see this. One of the few arguments for doing
> soft rate limiting with point-to-point WiFi is that it’s still possible, when
> rates are stable, to achieve lower latency under load than with the new ath9k
> driver. It would be nice to see that argument disappear. Solid...

I had some hope that perhaps the powersave timing architecture could be
leveraged (somehow), within the new API (somehow), to starve the ath9k
driver this way (someday).

>
> _______________________________________________
> Make-wifi-fast mailing list
> Make-wifi-fast@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Make-wifi-fast] less latency, more filling... for wifi
       [not found] <mailman.778.1507581712.3609.make-wifi-fast@lists.bufferbloat.net>
@ 2017-10-16 18:28 ` Pete Heist
  2017-10-16 19:56   ` Dave Taht
  0 siblings, 1 reply; 17+ messages in thread
From: Pete Heist @ 2017-10-16 18:28 UTC (permalink / raw)
  To: make-wifi-fast


> On Oct 9, 2017, at 1:13 PM, Dave Taht <dave.taht@gmail.com> wrote:
> 
> * Less buffering at the driver.
> 
> Presently (ath9k) there are two-three aggregates stacked up at the driver.
> 
> With a good estimate for how long it will take to service one, forming
> another within that deadline seems feasible, so you only need to have
> one in the hardware itself.
> 
> Simple example: you have data in the hardware projected to take a
> minimum of 4ms to transmit. Don't form a new aggregate and submit it
> to the hardware for 3.5ms.
> 
> I know full well that a "good" estimate is hard, and things like
> mu-mimo complicate things. Still, I'd like to get below 20ms of
> latency within the driver, and this is one way to get there.

For what it’s worth, I’d love to see this. One of the few arguments for doing soft rate limiting with point-to-point WiFi is that it’s still possible, when rates are stable, to achieve lower latency under load than with the new ath9k driver. It would be nice to see that argument disappear. Solid...


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-10-17  4:53 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-09 20:13 [Make-wifi-fast] less latency, more filling... for wifi Dave Taht
2017-10-09 20:41 ` dpreed
2017-10-09 21:04   ` Bob McMahon
2017-10-09 21:44     ` Simon Barber
2017-10-09 22:02       ` Bob McMahon
2017-10-11 20:03         ` Bob McMahon
2017-10-16 21:26       ` Simon Barber
2017-10-17  4:53         ` Bob McMahon
2017-10-11 21:30     ` Jesper Dangaard Brouer
2017-10-12  8:32       ` Toke Høiland-Jørgensen
2017-10-12 18:51         ` Bob McMahon
2017-10-13  9:28           ` Toke Høiland-Jørgensen
2017-10-13 18:47             ` Bob McMahon
2017-10-13 19:41               ` Bob McMahon
2017-10-14  1:46                 ` Bob McMahon
     [not found] <mailman.778.1507581712.3609.make-wifi-fast@lists.bufferbloat.net>
2017-10-16 18:28 ` Pete Heist
2017-10-16 19:56   ` Dave Taht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox