From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-x229.google.com (mail-oi0-x229.google.com [IPv6:2607:f8b0:4003:c06::229]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 4246821F5C7 for ; Sat, 31 Jan 2015 13:05:51 -0800 (PST) Received: by mail-oi0-f41.google.com with SMTP id z81so39490084oif.0 for ; Sat, 31 Jan 2015 13:05:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=F3S1q2W7BbviiYb+cIAw41NxEPq6VUjWhKjEYd4W0WQ=; b=qEjkwFbfyTyg0H9JrQTbvLWTowBOp/02p5pSXkFcRJ15nlLb2xoqORI+YvVCmW6O/I 46EpjqQF7Z1Gx9JY4culXCxE8BjYu2t9JD+S+NdbuVPjMFyTCWFTMimBAaqf0izY2Rq+ 3GeOrNqM2w/9pCcSjBHU0y94JU6Xy4sAgtPR67yejn6ESLqPSKrXTq9MlMaSWLMzXuQA +NYKJAaXBu2Q64mPnqafhMerDQ1KUlDRbtAyl7dv5pq/MrVCejgUUn5ilY6IKiCM93m3 PIGY1g6pZ9wIkJwqDbkIRklcSfnSSSBfl1rxVRRlGYWGI+LgX26I2uPuw+sBV+kQTKWD BAiQ== MIME-Version: 1.0 X-Received: by 10.202.204.142 with SMTP id c136mr7087007oig.81.1422738350844; Sat, 31 Jan 2015 13:05:50 -0800 (PST) Received: by 10.202.51.66 with HTTP; Sat, 31 Jan 2015 13:05:50 -0800 (PST) In-Reply-To: References: <1422537297.21689.15.camel@edumazet-glaptop2.roam.corp.google.com> <54CB5D08.2070906@broadcom.com> <1422623975.21689.77.camel@edumazet-glaptop2.roam.corp.google.com> <54CB8B69.1070807@broadcom.com> Date: Sat, 31 Jan 2015 13:05:50 -0800 Message-ID: From: Dave Taht To: Jim Gettys , Avery Pennarun , Andrew McGregor , Tim Shepard , Matt Mathis , Jesper Dangaard Brouer , Jonathan Morton , "cerowrt-devel@lists.bufferbloat.net" Content-Type: multipart/alternative; boundary=001a1135285202b18e050df91630 Subject: [Cerowrt-devel] Fwd: Throughput regression with `tcp: refine TSO autosizing` X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Jan 2015 21:06:20 -0000 --001a1135285202b18e050df91630 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I would like to have somehow assembled all the focused resources to make a go at fixing wifi, or at least having a f2f with a bunch of people in the late march timeframe. This message of mine to linux-wireless bounced for some reason and I am off to log out for 10 days, so... see relevant netdev thread also for ore details. ---------- Forwarded message ---------- From: Dave Taht Date: Sat, Jan 31, 2015 at 12:29 PM Subject: Re: Throughput regression with `tcp: refine TSO autosizing` To: Arend van Spriel Cc: linux-wireless , Michal Kazior < michal.kazior@tieto.com>, Eyal Perry , Network Development , Eric Dumazet The wifi industry as a whole has vastly bigger problems than achieving 1500Mbits in a faraday cage on a single flow. I encourage you to try tests in netperf-wrapper that explicitly test for latency under load, and in particular, the RTT_FAIR tests against 4 or more stations on a single wifi AP. You will find the results very depressing. Similarly, on your previous test series, a latency figure would have been nice to have. I just did a talk at nznog, where I tested the local wifi with less than ambits of throughput, and 3 seconds of latency, filmed here: https://plus.google.com/u/0/107942175615993706558/posts/CY8ew8MPnMt Do wish more folk were testing in the busy real world environments, like coffee shops, cities... really, anywhere outside a faraday cage! I am not attending netconf - I was unable to raise funds to go, and the program committee wanted something "new", instead of the preso I gave the IEEE 802.11 working group back in september. ( http://snapon.lab.bufferbloat.net/~d/ieee802.11-sept-17-2014/11-14-1265-00-= 0wng-More-on-Bufferbloat.pdf ) I was very pleased with the results of that talk - the day after I gave it, the phrase "test for latency" showed up in a bunch of 802.11ax (the next generation after ac) documents. :) Still, we are stuck with the train wreck that is 802.11ac glommed on top of 802.11n, glommed on top of 802.11g, in terms of queue management, terrible uses of airtime, rate control and other stuff. Aruba and Meraki, in particular took a big interest in what I'd outlined in the preso above (we have a half dozen less well baked ideas - that's just the easy stuff that can be done to improve wifi). I gave a followup at meraki but I don't think that's online. Felix (nbd) is on vacation right now, as I am I. In fact I am going somewhere for a week totally lacking internet access. Presently the plan, with what budget (none) we have and time (very little) we have is to produce a pair of proof of concept implementations for per tid queuing (see relevant commit by nbd), leveraging the new minstrel stats, the new minstrel-blues stuff, and an aggregation aware codel with a calculated target based on the most recently active stations, and a bunch of the other stuff outlined above at IEEE. It is my hope that this will start to provide accurate back pressure (or sufficient lack thereof for TSQ), to also improve throughput while still retaining low latency. But it is a certainty that we will run into more cross layer issues that will be a pita to resolve. If we can put together a meet up around or during ELC in california in march? I am really not terribly optimistic on anything other than the 2 chipsets we can hack on (ath9k, mt76). Negotiations to get qualcomm to open up their ath10k firmware have thus far failed, nor has a ath10k-lite got anywhere. Perhaps broadcom would be willing to open up their firmware sufficiently to build in a better API? A bit more below. On Jan 30, 2015 5:59 AM, "Arend van Spriel" wrote: > > On 01/30/15 14:19, Eric Dumazet wrote: >> >> On Fri, 2015-01-30 at 11:29 +0100, Arend van Spriel wrote: >> >>> Hi Eric, >>> >>> Your suggestions are still based on the fact that you consider wireless >>> networking to be similar to ethernet, but as Michal indicated there are >>> some fundamental differences starting with CSMA/CD versus CSMA/CA. Also >>> the medium conditions are far from comparable. The analogy i now use for it is that switched ethernet is generally your classic "dumbbell" topology. Wifi is more like a "taxi-stand" topology. If you think about how people queue up at a taxi stand (and sometimes agree to share a ride), the inter arrival and departure times of a taxi stand make for a better mental model. Admittedly, I seem to spend a lot of time, waiting for taxies, thinking about wifi. >> There is no shielding so >>> it needs to deal with interference and dynamically drops the link rate >>> so transmission of packets can take several milliseconds. Then with 11n >>> they came up with aggregation with sends up to 64 packets in a single >>> transmit over the air at worst case 6.5 Mbps (if I am not mistaken). Th= e >>> parameter value for tcp_limit_output_bytes of 131072 means that it >>> allows queuing for about 1ms on a 1Gbps link, but I hope you can see >>> this is not realistic for dealing with all variances of the wireless >>> medium/standard. I suggested this as topic for the wireless workshop in >>> Otawa [1], but I can not attend there. Still hope that there will be >>> some discussions to get more awareness. I have sometimes hoped that TSQ could be made more a function of the number of active flows exiting an interface, but eric tells me that's impossible. This is possibly another case where TSQ could use to be a callback function... but frankly I care not a whit about maximizing single flow tcp throughput on wifi in a faraday cage. >> >> Ever heard about bufferbloat ? > > > Sure. I am trying to get awareness about that in our wireless driver/firmware development teams. So bear with me. > > >> Have you read my suggestions and tried them ? >> >> You can adjust the limit per flow to pretty much you want. If you need >> 64 packets, just do the math. If in 2018 you need 128 packets, do the >> math again. >> >> I am very well aware that wireless wants aggregation, thank you. I note that a lot of people testing this are getting it backwards. Usually it is the AP that is sending lots and lots of big packets, where the return path is predominately acks from the station. I am not a huge fan of stretch acks, but certainly a little bit of thinning doesn't bother me on the return path there. Going the other way, particularly in a wifi world that insists on treating every packet as sacred (which I don't agree with at all), thinning acks can help, but single stream throughput is of interest only on benchmarks, FQing as much as possible all the flows destined the station in each aggregate masks loss and reduces the need to protect everything so much. > > Sorry if I offended you. I was just giving these as example combined with effective rate usable on the medium to say that the bandwidth is more dynamic in wireless and as such need dynamic change of queue depth. Now this can be done by making the fraction size as used in your suggestion adaptive to these conditions. Well... see above. Maybe this technique will do more of the right thing, but... go test. > >> 131072 bytes of queue on 40Gbit is not 1ms, but 26 usec of queueing, and >> we get line rate nevertheless. > > > I was saying it was about 1ms on *1Gbit* as the wireless TCP rates are moving into that direction in 11ac. > > >> We need this level of shallow queues (BQL, TSQ), to get very precise rtt >> estimations so that TCP has good entropy for its pacing, even in the 50 >> usec rtt ranges. >> >> If we allowed 1ms of queueing, then a 40Gbit flow would queue 5 MBytes. >> >> This was terrible, because it increased cwnd and all sender queues to >> insane levels. > > > Indeed and that is what we would like to address in our wireless drivers. I will setup some experiments using the fraction sizing and post my findings. Again sorry if I offended you. You really, really, really need to test at rates below 50mbit and with other stations, also while doing this. It's not going to be a linear curve. > > Regards, > Arend > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html --=20 Dave T=C3=A4ht thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks --001a1135285202b18e050df91630 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I would like to have somehow assembled all the focused res= ources to make a go at fixing wifi, or at least having a f2f with a bunch o= f people in the late march timeframe. This message of mine to linux-wireles= s bounced for some reason and I am off to log out for 10 days, so...
see relevant netdev thread also for ore details.
---------- Forwarded message ----------
From= : Dave Taht <dave.taht@gmail.com>
Date:= Sat, Jan 31, 2015 at 12:29 PM
Subject: Re: Throughput regression with `= tcp: refine TSO autosizing`
To: Arend van Spriel <arend@broadcom.com>
Cc: linux-wireless <linux-wireless@vger.kernel.o= rg>, Michal Kazior <mi= chal.kazior@tieto.com>, Eyal Perry <eyalpe@dev.mellanox.co.il>, Network Development <= netdev@vger.kernel.org>, E= ric Dumazet <eric.dumazet@gmai= l.com>


The wifi industry = as a whole has vastly bigger problems than achieving 1500Mbits in a faraday= cage on a single flow.

I encourage you to try tests in netperf-wrapp= er that explicitly test for latency under load, and in particular, the RTT_= FAIR tests against 4 or more stations on a single wifi AP. You will find th= e results very depressing. Similarly, on your previous test series, a laten= cy figure would have been nice to have. I just did a talk at nznog, where I= tested the local wifi with less than ambits of throughput, and 3 seconds o= f latency, filmed here:=C2=A0

https://plus.goo= gle.com/u/0/107942175615993706558/posts/CY8ew8MPnMt

Do wish more = folk were testing in the busy real world environments, like coffee shops, c= ities... really, anywhere outside a faraday cage!

I am not attending = netconf - I was unable to raise funds to go, and the program committee want= ed something "new",

instead of the preso I gave the IEEE 80= 2.11 working group back in september. (=C2=A0http://snapon.lab.bufferbloat.net/~d/ieee802.1= 1-sept-17-2014/11-14-1265-00-0wng-More-on-Bufferbloat.pdf )

I was= very pleased with the results of that talk - the day after I gave it, the = phrase "test for latency" showed up in a bunch of 802.11ax (the n= ext generation after ac) documents. :) Still, we are stuck with the train w= reck that is 802.11ac glommed on top of 802.11n, glommed on top of 802.11g,= in terms of queue management, terrible uses of airtime, rate control and o= ther stuff. Aruba and Meraki, in particular took a big interest in what I&#= 39;d outlined in the preso above (we have a half dozen less well baked idea= s - that's just the easy stuff that can be done to improve wifi).=C2=A0= I gave a followup at meraki but I don't think that's online.

Felix (nbd) is on vacation right now, as I am I. In fact I am going somewh= ere for a week totally lacking internet access.

Presently the plan, w= ith what budget (none) we have and time (very little) we have is to produce= a pair of proof of concept implementations for per tid queuing (see releva= nt commit by nbd), =C2=A0leveraging the new minstrel stats, the new minstre= l-blues stuff, and an aggregation aware codel with a calculated target base= d on the most recently active stations, and a bunch of the other stuff outl= ined above at IEEE.

It is my hope that this will start to provide acc= urate back pressure (or sufficient lack thereof for TSQ), to also improve t= hroughput while still retaining low latency. But it is a certainty that we = will run into more cross layer issues that will be a pita to resolve.

If we can put together a meet up around or during ELC in california in mar= ch?=C2=A0

I am really not terribly optimistic on anything other than = the 2 chipsets we can hack on (ath9k, mt76). Negotiations to get qualcomm t= o open up their ath10k firmware have thus far failed, nor has a ath10k-lite= got anywhere. Perhaps broadcom would be willing to open up their firmware = sufficiently to build in a better API?

A bit more below.<= /p>


On Jan 30, 2015 5:59 AM, "Arend van Spriel" <arend@broadcom.com> wrote: >
> On 01/30/15 14:19, Eric Dumazet wrote:
>>
>> On Fri, 2015-01-30 at 11:29 +0100, Arend van Spriel wrote:
>>
>>> Hi Eric,
>>>
>>> Your suggestions are still based on the fact that you consider= wireless
>>> networking to be similar to ethernet, but as Michal indicated = there are
>>> some fundamental differences starting with CSMA/CD versus CSMA= /CA. Also
>>> the medium conditions are far from comparable.=C2=A0

The analogy i now use for it is that switched ethernet is generally you= r classic "dumbbell"

topology. Wifi is more like a "ta= xi-stand" topology. If you think about how people

queue up at a = taxi stand (and sometimes agree to share a ride), the inter arrival

a= nd departure times of a taxi stand make for a better mental model.=C2=A0

Admittedly, I seem to spend a lot of time, waiting for taxies, thinking= about

wifi.

>> There is no = shielding so
>>> it needs to deal with interference and dynamically drops the l= ink rate
>>> so transmission of packets can take several milliseconds. Then= with 11n
>>> they came up with aggregation with sends up to 64 packets in a= single
>>> transmit over the air at worst case 6.5 Mbps (if I am not mist= aken). The
>>> parameter value for tcp_limit_output_bytes of 131072 means tha= t it
>>> allows queuing for about 1ms on a 1Gbps link, but I hope you c= an see
>>> this is not realistic for dealing with all variances of the wi= reless
>>> medium/standard. I suggested this as topic for the wireless wo= rkshop in
>>> Otawa [1], but I can not attend there. Still hope that there w= ill be
>>> some discussions to get more awareness.

I have s= ometimes hoped that TSQ could be made more a function of the

number of active flows exiting an interface, but eric tells me that'= ;s impossible.

This is possibly another case where TSQ co= uld use to be a callback function...

but frankly I care n= ot a whit about maximizing single flow tcp throughput on wifi

in a faraday cage.


>>
>> Ever heard about bufferbloat ?
>
>
> Sure. I am trying to get awareness about that in our wireless driver/f= irmware development teams. So bear with me.
>
>
>> Have you read my suggestions and tried them ?
>>
>> You can adjust the limit per flow to pretty much you want. If you = need
>> 64 packets, just do the math. If in 2018 you need 128 packets, do = the
>> math again.
>>
>> I am very well aware that wireless wants aggregation, thank you.
I note that a lot of people testing this are getting it backwa= rds. Usually it is the AP that is sending lots and lots of big packets, whe= re the return path is predominately acks from the station.=C2=A0

I am not a huge fan of stretch acks, but certainly a little bit of= thinning doesn't bother me on the return path there.

Going the o= ther way, particularly in a wifi world that insists on treating every packe= t as sacred (which I don't agree with at all), thinning acks can help, = but single stream throughput is of interest only on benchmarks, FQing as mu= ch as possible all the flows destined the station in each aggregate masks l= oss and reduces the need to protect everything so much.

<= span class=3D""> >
> Sorry if I offended you. I was just giving these as example combined w= ith effective rate usable on the medium to say that the bandwidth is more d= ynamic in wireless and as such need dynamic change of queue depth. Now this= can be done by making the fraction size as used in your suggestion adaptiv= e to these conditions.

Well... see above. Maybe this techniqu= e will do more of the right thing, but... go test.


>
>> 131072 bytes of queue on 40Gbit is not 1ms, but 26 usec of queuein= g, and
>> we get line rate nevertheless.
>
>
> I was saying it was about 1ms on *1Gbit* as the wireless TCP rates are= moving into that direction in 11ac.
>
>
>> We need this level of shallow queues (BQL, TSQ), to get very preci= se rtt
>> estimations so that TCP has good entropy for its pacing, even in t= he 50
>> usec rtt ranges.
>>
>> If we allowed 1ms of queueing, then a 40Gbit flow would queue 5 MB= ytes.
>>
>> This was terrible, because it increased cwnd and all sender queues= to
>> insane levels.
>
>
> Indeed and that is what we would like to address in our wireless drive= rs. I will setup some experiments using the fraction sizing and post my fin= dings. Again sorry if I offended you.

You really, real= ly, really need to test at rates below 50mbit and with other stations, also= while doing this. It's not going to be a linear curve.


>
> Regards,
> Arend
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev&= quot; in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at=C2=A0 http://vger.kernel.org/majordomo-info.html<= /a>




--
--001a1135285202b18e050df91630--