From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-x22b.google.com (mail-wi0-x22b.google.com [IPv6:2a00:1450:400c:c05::22b]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 8E4C921F18A for ; Thu, 22 Aug 2013 22:13:54 -0700 (PDT) Received: by mail-wi0-f171.google.com with SMTP id hr7so1520270wib.16 for ; Thu, 22 Aug 2013 22:13:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=qn0tjL2VVRsL23lZ682OaRtbtVcUjjh2THo0rwhJRb4=; b=Iqw3mSW62GRMOF8ls5zAl6zSOt2bD1FdpMW4BlHTPCkru5hax92m6pma55nlAkOapy 5AQTOC41O1RKHtx8UbBv53ASiVeQ8GOR7ln77vGZ7mVKRj51Plzp8nktB2fXlbkJn8OD aencpxy1Kr8SN4p8c0qEiiPniI3ZSJP/S5cjDigFoFTdwdB3uW+1Nez4YtqsiAUUbf8F Pc0syrdm+N5HVLugAyEq/IfnZSqG+X61QTSNE7Z9mwd3uUFBWYaR/zsD8ns/gRGdoTni vsO0Xjn3yxJldTBps9GbTS6dN/qOW7mAhjlvBN6aSiXrs0MCKM7qn2ZSbxErHcmBp8N4 Sf3A== MIME-Version: 1.0 X-Received: by 10.180.84.196 with SMTP id b4mr744456wiz.19.1377234832222; Thu, 22 Aug 2013 22:13:52 -0700 (PDT) Received: by 10.217.48.67 with HTTP; Thu, 22 Aug 2013 22:13:52 -0700 (PDT) In-Reply-To: <03951E31-8F11-4FB8-9558-29EAAE3DAE4D@gmx.de> References: <56B261F1-2277-457C-9A38-FAB89818288F@gmx.de> <2148E2EF-A119-4499-BAC1-7E647C53F077@gmx.de> <03951E31-8F11-4FB8-9558-29EAAE3DAE4D@gmx.de> Date: Thu, 22 Aug 2013 22:13:52 -0700 Message-ID: From: Dave Taht To: Sebastian Moeller Content-Type: multipart/alternative; boundary=f46d04426e2af27f2304e49678de Cc: Jesper Dangaard Brouer , "cerowrt-devel@lists.bufferbloat.net" Subject: Re: [Cerowrt-devel] some kernel updates X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Aug 2013 05:13:55 -0000 --f46d04426e2af27f2304e49678de Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Thu, Aug 22, 2013 at 5:52 PM, Sebastian Moeller wrote: > Hi List, hi Jesper, > > So I tested 3.10.9-1 to assess the status of the HTB atm link layer > adjustments to see whether the recent changes resurrected this feature. > Unfortunately the htb_private link layer adjustments still is > broken (RRUL ping RTT against Toke's netperf host in Germany of ~80ms, sa= me > as without link layer adjustments). On the bright side the tc_stab method > still works as well as before (ping RTT around 40ms). > I would like to humbly propose to use the tc stab method in > cerowrt to perform ATM link layer adjustments as default. To repeat mysel= f, > simply telling the kernel a lie about the packet size seems more robust > than fudging HTB's rate tables. Especially since the kernel already fudge= s > the packet size to account for the ethernet header and then some, so this > path should receive more scrutiny by virtue of having more users? > It's my hope that the atm code works but is misconfigured. You can output the tc commands by overriding the TC variable with TC=3D"echo tc" and paste here. > Now, I have been testing this using Dave's most recent cerowrt > alpha version with a 3.10.9 kernel on mips hardware, I think this kernel > should contain all htb fixes including commit 8a8e3d84b17 (net_sched: > restore "linklayer atm" handling) but am not fully sure. > It does. > `@Dave is there an easy way to find which patches you applied to the > kernels of the cerowrt (testing-)releases? Normally I DO commit stuff that is in testing, but my big push this time around was to get everything important into mainline 3.10, as it will be the "stable" release for a good long time. So I am still mostly working the x86 side at the moment. I WAS kind of hoping that everything I just landed would make it up to 3.10. But for your perusal: http://snapon.lab.bufferbloat.net/~cero2/patches/3.10.9-1/ has most of the kernel patches I used in it. 3.10.9-2 has the ipv6subtrees patch ripped out due to another weird bug I'm looking at. (It also has support for ipv6 nat thx to the ever prolific stephen walker heeding the call for patches...). 100% totally untested, I have this weird bug to figure out how to fix next: http://lists.alioth.debian.org/pipermail/babel-users/2013-August/001419.htm= l I fear it's a comparison gone south, maybe in bradley's optimizations for not kernel trapping, don't know. 3.10.9-2 also disables dnsmasq's dhcpv6 in favor of 6relayd. I HATE losing the close naming integration, but, had to try this.... If you guys want me to start committing and pushing patches again, I'll do it, but most of that stuff will end up in 3.10.10, I think, in a couple days. The rest might make 3.12. Pie has to survive scrutiny on the netdev list in particular. While I have you r attention :) I also tested 3.10.9-1's pie and it is way > better than 3.10.6-1's (RRUL ping RTTs around 110 ms instead of 3000ms) b= ut > still worse than fq_codel (ping RTTs around 40ms with proper atm link lay= er > adjustments). > This is with simple.qos I imagine? Simplest should do better than that with pie. Judging from how its estimator works I think it will do badly with multiple queues. But testing will tell... But, yea, this pie is actually usable, and the previous wasn't. Thank you for looking at it! It is different from cisco's last pie drop in that it can do ecn, does local congestion notification, has a better use of net_random, it's mostly KernelStyle, and I forget what else. There is still a major rounding error in the code, and I'd like cisco to fix the api so it uses identical syntax to codel. Right now you specify "target 8" to get "target 7", and the "ms" is implied. target 5 becomes target 3. The default target is a whopping 20 (rounded to 19), which is in part where your 70+ms of extra delay came from. Multiple parties have the delusion that 20ms is "good enough". Part of the remaining delay may also be rounding error. Cisco uses kernels with HZ=3D1000, cero uses HZ=3D250..... Anyway, to get more comparable tests... you can fiddle with the two $QDISC lines in simple*.qos to add a target 8 to get closer to a codel 5ms config, but that would break a codel config which treats target 8 as target 8us. I MIGHT, if I get energetic enough, fix the API, the time accounting, and a few other things in pie, the problem is, that ns2_codel seems still more effective on most workloads and *fq_codel smokes absolutely everything. There are a few places where pie is a win over straight codel, notably on packet floods. And it may well be easier to retrofit into existing hardware fast path designs. I worry about interactions between pie and other stuff. It seems inevitable at this point that some form of pie will be widely deployed, and I simply haven't tried enough traffic types and RTTs to draw a firm conclusion, period. Long RTTs are the last big place where codel and pie and fq_codel have to be seriously tested. ns2_codel is looking pretty good now, at the shorter RTTs I've tried. A big problem I have is getting decent long RTT emulation out of netem (some preliminary code is up at github) ... and getting cero stable enough for others to actually use - next up is fixing the userspace problems. ... and trying to make a small dent in the wifi problem along the way (couple commits coming up) ... and find funding to get through the winter. There's probably a few other things that are on that list but I forget. Oh, yea, since the aqm wg was voted on to be formed, I decided I could quit smoking. > While I am not able to build kernels, it seems that I am able to quickly > test whether link layer adjustments work or not. SO aim happy to help whe= re > I can :) > Give pie target 8 and target 5 a shot, please? ns2_codel target 3ms and target 7ms, too. fq_codel, same.... tc -s qdisc show dev ge00 tc -s qdisc show dev ifb0 would be useful info to have in general after each test. TIA. There are also things like tcp_upload and tcp_download and tcp_bidirectional that are useful tests in the rrul suite. Thank you for your efforts on these early alpha releases. I hope things will stablize more soon, and I'll fold your aqm stuff into my next attempt this weekend. This is some of the stuff I know that needs fixing in userspace: * TODO readlink not found * TODO netdev user missing * TODO Wed Dec 5 17:14:46 2012 authpriv.error dnsmasq: found already running DHCP-server on interface 'se00' refusing to start, use 'option force 1' to override * TODO [ 18.480468] Mirror/redirect action on [ 18.539062] Failed to load ipt action * upload and download are reversed in aqm * BCP38 * Squash CS values * Replace ntp * Make ahcp client mode * Drop more privs for polipo * upnp * priv separation * Review FW rules * dhcpv6 support * uci-defaults/make-cert.sh uses a bad path for px5g * Doesn't configure the web browser either > > Best > Sebastian > > --=20 Dave T=E4ht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html --f46d04426e2af27f2304e49678de Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable



On Thu, Aug 22, 2013 at 5:52 PM, Sebastian Moeller <moeller0@gmx.de<= /a>> wrote:
Hi List, hi Jesper,

So I tested 3.10.9-1 to assess the status of the HTB atm link layer adjustm= ents to see whether the recent changes resurrected this feature.
=A0 =A0 =A0 =A0 Unfortunately the htb_private link layer = adjustments still is broken (RRUL ping RTT against Toke's netperf host = in Germany of ~80ms, same as without link layer adjustments). On the bright= side the tc_stab method still works as well as before (ping RTT around 40m= s).
=A0 =A0 =A0 =A0 I would like to humbly propose to use the tc stab method in= cerowrt to perform ATM link layer adjustments as default. To repeat myself= , simply telling the kernel a lie about the packet size seems more robust t= han fudging HTB's rate tables. Especially since the kernel already fudg= es the packet size to account for the ethernet header and then some, so thi= s path should receive more scrutiny by virtue of having more users?

It's my hope that the atm code w= orks but is misconfigured. You can output the tc commands by overriding the= TC variable with TC=3D"echo tc" and paste here.
=A0
=A0 =A0 =A0 =A0 Now, I have been testing this using Dave's most r= ecent cerowrt alpha version with a 3.10.9 kernel on mips hardware, I think = this kernel should contain all htb fixes including commit 8a8e3d84b17 (net_= sched: restore "linklayer atm" handling) but am not fully sure.

It does.
=A0
`@Dave is there an easy way to find which patches you applied to the kernel= s of the cerowrt (testing-)releases?

Norma= lly I DO commit stuff that is in testing, but my big push this time around = was to get everything important into mainline 3.10, as it will be the "= ;stable" release for a good long time.
=A0
So I am still mostly working the x86 side at the moment. = I WAS kind of hoping that everything I just landed would make it up to 3.10= . But for your perusal:

http://snapon.lab.bufferbloat.net/~cero2/patches= /3.10.9-1/ has most of the kernel patches I used in it. 3.10.9-2 has th= e ipv6subtrees patch ripped out due to another weird bug I'm looking at= . (It also has support for ipv6 nat thx to the ever prolific stephen walker= heeding the call for patches...). 100% totally untested, I have this weird= bug to figure out how to fix next:
I fear it's a comp= arison gone south, maybe in bradley's optimizations for not kernel trap= ping, don't know.

3.10.9-2 also disables dnsmasq's dhcpv6 in favor of 6rel= ayd. I HATE losing the close naming integration, but, had to try this....
If you guys want me to start committing and pushing patche= s again, I'll do it, but most of that stuff will end up in 3.10.10, I t= hink, in a couple days. The rest might make 3.12. Pie has to survive scruti= ny on the netdev list in particular.

Whil= e I have you r attention :) I also tested 3.10.9-1's pie and it is way = better than 3.10.6-1's (RRUL ping RTTs around 110 ms instead of 3000ms)= but still worse than fq_codel (ping RTTs around 40ms with proper atm link = layer adjustments).

This is with simple.qos I imagine? Simples= t should do better than that with pie. Judging from how its estimator works= I think it will do badly with multiple queues. But testing will tell...
But, yea, this pie is actually usable, and the previous wasn't. Tha= nk you for looking at it!

It is different from cisco'= s last pie drop in that it can do ecn, does local congestion notification, = has a better use of net_random, it's mostly KernelStyle, and I forget w= hat else.

There is still a major rounding error in the code, and I'= ;d like cisco to fix the api so it uses identical syntax to codel. Right no= w you specify "target 8" to get "target 7", and the &qu= ot;ms" is implied. target 5 becomes target 3. The default target is a = whopping 20 (rounded to 19), which is in part where your 70+ms of extra del= ay came from.

Multiple parties have the delusion that 20ms is "good e= nough".

Part of the remaining delay may also be roun= ding error. Cisco uses kernels with HZ=3D1000, cero uses HZ=3D250.....
<= br>
Anyway, to get more comparable tests... you can fiddle with the = two $QDISC lines in simple*.qos to add a target 8 to get closer to a codel = 5ms config, but that would break a codel config which treats target 8 as ta= rget 8us.

I MIGHT, if I get energetic enough, fix the API, the time ac= counting, and a few other things in pie, the problem is, that ns2_codel see= ms still more effective on most workloads and *fq_codel smokes absolutely e= verything. There are a few places where pie is a win over straight codel, n= otably on packet floods. And it may well be easier to retrofit into existin= g hardware fast path designs.

I worry about interactions between pie and other stuff. It seems inevit= able at this point that some form of pie will be widely deployed, and I sim= ply haven't tried enough traffic types and RTTs to draw a firm conclusi= on, period. Long RTTs are the last big place where codel and pie and fq_cod= el have to be seriously tested.

ns2_codel is looking pretty good now, at the shorter RTTs I&= #39;ve tried. A big problem I have is getting decent long RTT emulation out= of netem (some preliminary code is up at github)

... an= d getting cero stable enough for others to actually use - next up is fixing= the userspace problems.

... and trying to make a small dent in the wifi problem alon= g the way (couple commits coming up)

... and find funding= to get through the winter.
=A0
There's pro= bably a few other things that are on that list but I forget. Oh, yea, since= the aqm wg was voted on to be formed, I decided I could quit smoking.
=A0
While I am not able to build kernels, it seems that I am able to quickly te= st whether link layer adjustments work or not. SO aim happy to help where I= can :)

Give pie target 8 and target 5 = a shot, please? ns2_codel target 3ms and target 7ms, too. fq_codel, same...= .
=A0
tc -s qdisc show dev ge00
tc -s qdisc show dev ifb0
would be useful info to have in general after each test.
TIA.

There are also things like tcp_upload and tcp_d= ownload and tcp_bidirectional that are useful tests in the rrul suite.

Thank you for your efforts on these early alpha releases. I = hope things will stablize more soon, and I'll fold your aqm stuff into = my next attempt this weekend.

This is some of the stuff I= know that needs fixing in userspace:

* TODO readlink not found
* TODO netdev user missing
* TODO Wed D= ec=A0 5 17:14:46 2012 authpriv.error dnsmasq: found already running DHCP-se= rver on interface 'se00' refusing to start, use 'option force 1= ' to override
* TODO [=A0=A0 18.480468] Mirror/redirect action on
[=A0=A0 18.539062] F= ailed to load ipt action
* upload and download are reversed in aqm
* = BCP38
* Squash CS values
* Replace ntp
* Make ahcp client mode
= * Drop more privs for polipo
* upnp
* priv separation
* Review FW rules
* dhcpv6 support
* u= ci-defaults/make-cert.sh uses a bad path for px5g
* Doesn't configur= e the web browser either




Best
=A0 =A0 =A0 =A0 Sebastian




--
Dave T=E4= ht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowr= t/subscribe.html=20
--f46d04426e2af27f2304e49678de--