[Cerowrt-devel] some kernel updates

Sebastian Moeller moeller0 at gmx.de
Fri Aug 23 05:16:52 EDT 2013

Hi Dave,

On Aug 23, 2013, at 07:13 , Dave Taht <dave.taht at gmail.com> wrote:

> On Thu, Aug 22, 2013 at 5:52 PM, Sebastian Moeller <moeller0 at gmx.de> wrote:
> Hi List, hi Jesper,
> So I tested 3.10.9-1 to assess the status of the HTB atm link layer adjustments to see whether the recent changes resurrected this feature.
>         Unfortunately the htb_private link layer adjustments still is broken (RRUL ping RTT against Toke's netperf host in Germany of ~80ms, same as without link layer adjustments). On the bright side the tc_stab method still works as well as before (ping RTT around 40ms).
>         I would like to humbly propose to use the tc stab method in cerowrt to perform ATM link layer adjustments as default. To repeat myself, simply telling the kernel a lie about the packet size seems more robust than fudging HTB's rate tables. Especially since the kernel already fudges the packet size to account for the ethernet header and then some, so this path should receive more scrutiny by virtue of having more users?
> It's my hope that the atm code works but is misconfigured. You can output the tc commands by overriding the TC variable with TC="echo tc" and paste here.

	I will do this once I am back home. But I did check "tc -d qdisc" and "tc -d class show dev ge00" and got:

>  root at nacktmulle:~# tc -d class show dev ge00
> class htb 1:11 parent 1:1 leaf 110: prio 1 quantum 1500 rate 128000bit overhead 40 ceil 810000bit burst 2Kb/1 mpu 0b overhead 0b cburst 12953b/1 mpu 0b overhead 0b level 0 
> class htb 1:1 root rate 2430Kbit overhead 40 ceil 2430Kbit burst 2Kb/1 mpu 0b overhead 0b cburst 2Kb/1 mpu 0b overhead 0b level 7 
> class htb 1:10 parent 1:1 prio 0 quantum 1500 rate 2430Kbit overhead 40 ceil 2430Kbit burst 2Kb/1 mpu 0b overhead 0b cburst 2Kb/1 mpu 0b overhead 0b level 0 
> class htb 1:13 parent 1:1 leaf 130: prio 3 quantum 1500 rate 405000bit overhead 40 ceil 2366Kbit burst 2Kb/1 mpu 0b overhead 0b cburst 11958b/1 mpu 0b overhead 0b level 0 
> class htb 1:12 parent 1:1 leaf 120: prio 2 quantum 1500 rate 405000bit overhead 40 ceil 2366Kbit burst 2Kb/1 mpu 0b overhead 0b cburst 11958b/1 mpu 0b overhead 0b level 0 
> class fq_codel 110:20e parent 110: 
> class fq_codel 120:10 parent 120: 
> root at nacktmulle:~# tc -d qdisc
> qdisc fq_codel 0: dev se00 root refcnt 2 limit 1024p flows 1024 quantum 300 target 5.0ms interval 100.0ms ecn 
> qdisc htb 1: dev ge00 root refcnt 2 r2q 10 default 12 direct_packets_stat 0 ver 3.17
> qdisc fq_codel 110: dev ge00 parent 1:11 limit 600p flows 1024 quantum 300 target 5.0ms interval 100.0ms 
> qdisc fq_codel 120: dev ge00 parent 1:12 limit 600p flows 1024 quantum 300 target 5.0ms interval 100.0ms 
> qdisc fq_codel 130: dev ge00 parent 1:13 limit 600p flows 1024 quantum 300 target 5.0ms interval 100.0ms 
> qdisc ingress ffff: dev ge00 parent ffff:fff1 ---------------- 
> qdisc htb 1: dev ifb0 root refcnt 2 r2q 10 default 12 direct_packets_stat 0 ver 3.17
> qdisc fq_codel 110: dev ifb0 parent 1:11 limit 1000p flows 1024 quantum 500 target 5.0ms interval 100.0ms ecn 
> qdisc fq_codel 120: dev ifb0 parent 1:12 limit 1000p flows 1024 quantum 1500 target 5.0ms interval 100.0ms ecn 
> qdisc fq_codel 130: dev ifb0 parent 1:13 limit 1000p flows 1024 quantum 1500 target 5.0ms interval 100.0ms ecn 
> qdisc mq 0: dev sw00 root 
> qdisc mq 0: dev gw01 root 
> qdisc mq 0: dev gw00 root 
> qdisc mq 0: dev sw10 root 
> qdisc mq 0: dev gw11 root 
> qdisc mq 0: dev gw10 root 

	So at least the configured overhead of 40 bytes shows up using htb_private. Unlike tc_stab which reports the link layer in "tc -d qdisc" I never figured out whether htb ever reports the link layer option at all. Changing the overhead value in AQM changes the reported overhead in "tc -d class show dev ge00".
	That said I will collect the tc output and post it here...

>         Now, I have been testing this using Dave's most recent cerowrt alpha version with a 3.10.9 kernel on mips hardware, I think this kernel should contain all htb fixes including commit 8a8e3d84b17 (net_sched: restore "linklayer atm" handling) but am not fully sure.
> It does. 

	You rock!

> `@Dave is there an easy way to find which patches you applied to the kernels of the cerowrt (testing-)releases?
> Normally I DO commit stuff that is in testing, but my big push this time around was to get everything important into mainline 3.10, as it will be the "stable" release for a good long time. 

	Oh sorry, I know that I am testing your WIP branch here, and I think it is great that you share this with us so we can test early and often. I just realized that I had no way of knowing which patches made it into 3.10.9-1...

> So I am still mostly working the x86 side at the moment. I WAS kind of hoping that everything I just landed would make it up to 3.10. But for your perusal:
> http://snapon.lab.bufferbloat.net/~cero2/patches/3.10.9-1/ has most of the kernel patches I used in it.

	Thanks a lot!

> 3.10.9-2 has the ipv6subtrees patch ripped out due to another weird bug I'm looking at. (It also has support for ipv6 nat thx to the ever prolific stephen walker heeding the call for patches...). 100% totally untested, I have this weird bug to figure out how to fix next:
> http://lists.alioth.debian.org/pipermail/babel-users/2013-August/001419.html
> I fear it's a comparison gone south, maybe in bradley's optimizations for not kernel trapping, don't know.
> 3.10.9-2 also disables dnsmasq's dhcpv6 in favor of 6relayd. I HATE losing the close naming integration, but, had to try this….

	Getting IPv6 working is my next toy project, once the atm issue is gone for good :)

> If you guys want me to start committing and pushing patches again, I'll do it, but most of that stuff will end up in 3.10.10, I think, in a couple days.

	Oh, no you are on the driver's seat here, you set the pace. I just got carried away by the thought that atm might be fixed and all that was needed was confirmation :)

> The rest might make 3.12. Pie has to survive scrutiny on the netdev list in particular.
> While I have you r attention :) I also tested 3.10.9-1's pie and it is way better than 3.10.6-1's (RRUL ping RTTs around 110 ms instead of 3000ms) but still worse than fq_codel (ping RTTs around 40ms with proper atm link layer adjustments).
> This is with simple.qos I imagine? Simplest should do better than that with pie. Judging from how its estimator works I think it will do badly with multiple queues. But testing will tell...
> But, yea, this pie is actually usable, and the previous wasn't. Thank you for looking at it!
> It is different from cisco's last pie drop in that it can do ecn, does local congestion notification, has a better use of net_random, it's mostly KernelStyle, and I forget what else.
> There is still a major rounding error in the code, and I'd like cisco to fix the api so it uses identical syntax to codel. Right now you specify "target 8" to get "target 7", and the "ms" is implied. target 5 becomes target 3.

	Is there a method to this madness?

> The default target is a whopping 20 (rounded to 19), which is in part where your 70+ms of extra delay came from. 

	so like 20 up 20 down, totaling 40ms just from a bad target value...

> Multiple parties have the delusion that 20ms is "good enough".

	Hmm, I would have thought that cisco with its IP telephony products of all companies would think that increasing the latency by a factor of 3 to 4 over the unloaded condition would be "sub-optimal".

> Part of the remaining delay may also be rounding error. Cisco uses kernels with HZ=1000, cero uses HZ=250…..
> Anyway, to get more comparable tests... you can fiddle with the two $QDISC lines in simple*.qos to add a target 8 to get closer to a codel 5ms config, but that would break a codel config which treats target 8 as target 8us.

	Ah, I can do this tonight and run a test on pie to see whether the RTT comes down by 40ms - 2*7ms = 26ms...

> I MIGHT, if I get energetic enough, fix the API, the time accounting, and a few other things in pie, the problem is, that ns2_codel seems still more effective on most workloads and *fq_codel smokes absolutely everything.

	I agree, fq_codel looks like the winner (well efq_codel and nfq_codel are indiscernible from fq_codel in my RRUL tests, but they too are fq_codel for the most part I guess)

> There are a few places where pie is a win over straight codel, notably on packet floods.

	I am not set up in any way to test this.

> And it may well be easier to retrofit into existing hardware fast path designs. 

	Well, it seems superior to no AQM so looks like a decent stop gap measure until fq_cofdel can migrate to all routers :)

> I worry about interactions between pie and other stuff. It seems inevitable at this point that some form of pie will be widely deployed, and I simply haven't tried enough traffic types and RTTs to draw a firm conclusion, period. Long RTTs are the last big place where codel and pie and fq_codel have to be seriously tested. 

	What do you consider to be a long RTT? From home I have a best case ping RTT to snapon of 180ms, so if this is sufficient I might be able to help. Would starting netperf on my router help you in testing? My bandwidth up 2430Kbit/s and down 15494Kbit/s might be a bit measly. I will be taking my family on holiday next week, so there could be another remote test site if you want.

> ns2_codel is looking pretty good now, at the shorter RTTs I've tried. A big problem I have is getting decent long RTT emulation out of netem (some preliminary code is up at github) 

	Or just testing over real long paths?

> ... and getting cero stable enough for others to actually use - next up is fixing the userspace problems. 

	I think it actually is pretty useable even in its current CI state.

> ... and trying to make a small dent in the wifi problem along the way (couple commits coming up)
> ... and find funding to get through the winter.
> There's probably a few other things that are on that list but I forget. Oh, yea, since the aqm wg was voted on to be formed, I decided I could quit smoking.


> While I am not able to build kernels, it seems that I am able to quickly test whether link layer adjustments work or not. SO aim happy to help where I can :)
> Give pie target 8 and target 5 a shot, please? ns2_codel target 3ms and target 7ms, too. fq_codel, same….

	Aye, will do.

> tc -s qdisc show dev ge00
> tc -s qdisc show dev ifb0
> would be useful info to have in general after each test.

	Agreed, that is basically what I do, but so far never saved the results...

> TIA.
> There are also things like tcp_upload and tcp_download and tcp_bidirectional that are useful tests in the rrul suite.

	I might get around to test those, but for the only small niche where I can offer testing RRUL seems to work quite well.

> Thank you for your efforts on these early alpha releases. I hope things will stablize more soon, and I'll fold your aqm stuff into my next attempt this weekend.

	Thanks a lot.

> This is some of the stuff I know that needs fixing in userspace:
> * TODO readlink not found
> * TODO netdev user missing
> * TODO Wed Dec  5 17:14:46 2012 authpriv.error dnsmasq: found already running DHCP-server on interface 'se00' refusing to start, use 'option force 1' to override
> * TODO [   18.480468] Mirror/redirect action on
> [   18.539062] Failed to load ipt action
> * upload and download are reversed in aqm

I think that is fixed, at least the rate I set in download is applied to the htb attached to ifb0 and the upload to ge00 which seems quite correct. Or are you concerned about the initial values that show up in the AQM guy? If the latter I can try to set the defaults in model/cbi/aqm.lua…

> * BCP38
> * Squash CS values
> * Replace ntp
> * Make ahcp client mode
> * Drop more privs for polipo
> * upnp
> * priv separation
> * Review FW rules
> * dhcpv6 support
> * uci-defaults/make-cert.sh uses a bad path for px5g
> * Doesn't configure the web browser either

	I would love to see the open connect client package to be included (https://dev.openwrt.org/browser/packages/net/openconnect/Makefile), but I might be the only one.

Thanks a lt & Best Regards

> Best
>         Sebastian
> -- 
> Dave Täht
> Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

More information about the Cerowrt-devel mailing list