[Cerowrt-devel] development snapshot of cerowrt-3.3.8-21 released

Dave Taht dave.taht at gmail.com
Mon Aug 27 19:15:13 EDT 2012


I spent the last two weeks hunting down memory related
issues and trying to fold in some development work I'd had
going already.

As best as I can tell, the core memory issues are killed dead; whether
this is due to freeing up tons of memory or by the various other stuff
remains to be determined.

But:

Under *no circumstances* install this release on your default router.

I'm putting this out primarily because I'm seeing an odd behavior
on the iwl card I have, but not on the ath9ks (yet!, still testing).
If there is someone with a 3rd type of wireless card is out there,
anything non-iwl or non-linux,  please beat this up.

Radical changes:

+ bind replaced with dnsmasq (bind available as an option)
+ support for AAAA naming and RA announcements in dnsmasq
+ Implementation of experimental codel code from kathie's ns2 work
+ fq_codel engages codel sooner
+ fq_codel has a CS1 deprioritization hack
+ codel and fq_codel shrink skbs under overload
+ debloat has reduced defaults for packet limits
+ debloat uses qlens of 2,4,12,12 (up from 2,3,3,3)
+ qos-scripts has reduced defaults
+ strongswan available again

- I haven't looked at the hurricane ipv6 issue
- dlna, upnp, either
- didn't fix ath9k to use smaller allocations
- no tcp small queues

Big bugs remain.

1) htb does weird things at all bandwidths, and with all qdiscs,
not just codel/fq_codel.

It may well have been doing this for a while (like, months), which would explain
a lot. hfsc is also being weird. (hfsc is used by qos-scripts, htb by
simple_qos)


2) wifi vs the x86 iwl card.

This is the error that I get in /var/log/messages, and the only way to get
connectivity back and clear it is to reboot the *x86* box.

[67046.216150] iwlwifi 0000:03:00.0: fail to flush all tx fifo queues
[67048.224185] iwlwifi 0000:03:00.0: fail to flush all tx fifo queues
[67056.868185] iwlwifi 0000:03:00.0: fail to flush all tx fifo queues

This is how I get the error, inside of about 30 seconds. (where the ip
is the router's ip)

netperf -Y CS5,CS5 -l 120 -H 172.20.42.65 -t TCP_STREAM &
netperf -Y EF,EF -l 120 -H 172.20.42.65 -t TCP_STREAM &
netperf -Y CS1,CS1 -l 120 -H 172.20.42.65 -t TCP_STREAM &
netperf -Y CS0,CS0 -l 120 -H 172.20.42.65 -t TCP_STREAM &

The above saturates the EF and VI queues, and for some reason
starves the BE, BK queues (on iwl)...

Packet traces indicate strongly that it's the iwl that's hosed, in
this tcpdump, it
is receiving packets from cero, but no longer able to transmit them.

15:26:15.889147 ARP, Request who-has ida.home.lan tell 172.20.11.97, length 28
15:26:15.889185 ARP, Reply ida.home.lan is-at 00:26:c6:42:76:e2 (oui
Unknown), length 28

(and I'm not running codel/fq_codel on the x86 box, either, on this test)

I will refine this bug report more over time and get it to the
linux-wireless mailing list.
I just need to setup more boxes.

The same codel related patch set for linux-3.6-rc3 x86 is now up as
"codel2-ns2",
where htb, the codel patches, etc is *just fine*, over ethernet. htb
on x86/that version
is also just fine. I'm pretty happy with this patch set, it feels like
an improvement
(at least on x86) over codel and fq_codel from before.

http://snapon.lab.bufferbloat.net/~cero1/deb/

But on cero, htb has got extra-ordinary delays that shouldn't be there.

and 3.3.8-21 is at (have I given you enough warning yet?)

http://snapon.lab.bufferbloat.net/~cero1/3.3/3.3.8-21/

Up next for me is backing off to a way earlier version of cero, and
incrementally
adding back in stuff. But first up is a dip in the pool, and beating my
head against a tree. Or vice versa.

And if I'm lucky some x86 boxes will arrive soon and I can go build those
instead of going nutso on this.

-- 
Dave Täht
http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out
with fq_codel!"



More information about the Cerowrt-devel mailing list