[Cerowrt-devel] cerowrt-3.10.44-6 report

Dave Taht dave.taht at gmail.com
Wed Jul 9 17:44:24 EDT 2014


I have been pounding several cerowrt boxes utterly flat for 13 days now.

root at davedesk:~# uptime
 20:54:29 up 13 days, 13:28,  load average: 0.04, 0.04, 0.04 (well,
formerly flat prior to this email)

Aside from seeing one kernel trap (see bug #442) for it, it's stayed
up on wifi, reliably for 10s of thousands of tests... for me.

I have - along the way - collected gigabytes of useless packet
captures, crashed every serial dongle I own, the 802.11ac ap I'm
working on, windows multiple times, and linux on a pair of laptops,
and reduced multiple beaglebones with the edimax 802.11ac to
gibbering, crashed hysteria unrecoverable even with a usb serial
connection, needing a reflash.

but never cerowrt. So I'm happy about that, and depressed about the
sad state of wifi on *everything else*. (the rtl8812au driver has to
be seen to be believed). I've wasted some time trying to sort through
that and glad that the ath10k driver has been getting some serious
love of late....

But I've had several reports that cerowrt 3.10.44-6 fails for others
with bug 442-like symptoms. It does seem that the one near-constant in
all reports are "osx", and "poor signal strength", so over the weekend
I finally got a macbook with 802.11ac to drive more tests. This brings
up the number of stations I have to 1 windows, 1 mac, 2 linux laptops,
6 beaglebones, and a bunch of APs (that I have been connecting to each
other in adhoc and sta mode) to try and blow up EVERYTHING.

One useful bit of fallout from all this has been being able to test
multi-station wifi performance, which is predictably horrible, but
given all the other stuff I've been testing simultaneously, the data
is hard to sort through or publish - goal here is to crash stuff, and
do forensics, not science.

I am encouraged by some of the bugs denton gentry has been fixing over
on the ath10k mailing list, and wonder if some of the same things
happen on ath9k and other drivers. I'm still pretty convinced the 442
problem is generic to the darn ath9k but being unable to duplicate the
problem is a problem.

His methods are extreme! and he keeps finding, interesting, subtle
problems, with hugely bad side-effects, like this one:

http://lists.infradead.org/pipermail/ath10k/2014-July/002606.html

Which I'm sure exist, in device drivers everywhere.

So, next up, for me - is to keep testing 802.11ac and n on the device
I'm getting paid to beat up - while exercising cero as hard as
possible (I'm using it to drive the ac box as one example). I will be
adding impairments this time to the non-mac boxes, and if things keep
working, add impairments to the mac box, while capturing as much as
possible.

A decision to make is whether or not to refresh 3.10.44-6 with openwrt
head. If I don't do it, it will be another 14 days before I stop
testing and can refresh, and ietf will be upon me.

I really, really, really, really wanted a stable cerowrt release, and
then to move on. I'd hoped that 3.10.44-6 would have been it. I've
thought about putting out a bug bounty for it, if that would find
someone with the wherewithal to nail this !@#! thing to the floor.

In the interim, I'd like to make clear to everyone that I regard bug
442 as the only thing holding up a general stable release, and there
have been a couple updates to it.

http://www.bufferbloat.net/issues/442

and anything you can do to beat up your boxes, while capturing
traffic, and the failure event(s), will help.

Of HUGE interest is getting raw captures from a wifi monitoring
interface on a regular basis from someone (anyone!) experiencing this
problem, and thus capture exactly when and why it happens. Not all
wifi chips support it wifi monitor mode, but if you install
aircrack-ng, enabling it is straightforward:

sudo airmon-ng start wlan0

and then wireshark can see the interface and capture/decode raw wifi
packets , as can tcpdump -I.

I just nuked a bunch of captures and tests to get some disk space back
and am setting up the full monty again, while coaxing this mac to have
a decent compiler and setup. Actually, I think I'm going to go get a
2TB disk for the monitor box.

I incidentally just stumbled across a 1998-2000 history of wireless
development I started to write 4 years ago, and, well, I'd forgotten
the pain of the 7 months of initial development, and the years of
trouble we still had with it after. I really don't enjoy the low level
driver stuff.

http://www.teklibre.com/~d/elwr/wireless_2.html (_1, _3, _4, etc, it
gets up to 9)

According to this,

http://www.teklibre.com/~d/elwr/emails.html

My first documented encounter with the need for aqm and packet
scheduling on wireless was:

Mon, 19 Oct 1998 19:18:09



-- 
Dave Täht

NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article



More information about the Cerowrt-devel mailing list