From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-x22b.google.com (mail-pa0-x22b.google.com [IPv6:2607:f8b0:400e:c03::22b]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 1D91A21F391 for ; Wed, 9 Jul 2014 14:44:25 -0700 (PDT) Received: by mail-pa0-f43.google.com with SMTP id lf10so9920929pab.30 for ; Wed, 09 Jul 2014 14:44:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=LgiV55kaFvRE7vR162XUhYUmwEolcr0gxeGSFtgtBIQ=; b=CNJ9yrfjEWl4Il7pgpdpLzI7MjSzeoEZkFcpFjD6RZ85ISwx1ipEaLPERo8GQx7wvp GuIoFzUMnPAmeQAzuX9WX/1jA71zOnQBGecI8hoOOg2/5y2/nckdZ5DMKOEFpkjBXpTo g3198cAzzpLmQGUqY9cyW+JQhj3QENrehQsQ6k0HecUyT4oI/UAOjOadoKjNWUKOgq2X F3HuqLGmufbFr6mbClj8EDEuYB5TGIM2ArhYjdhIhcUXucPv0f/ecV3iVbD5wzHouOZe jCN9TG984jjjksG0athjrlAu1izk/DblwN8jvDApxi0X9k0o3SWi5Cmazbhk1RC6lAX6 RawQ== MIME-Version: 1.0 X-Received: by 10.68.162.34 with SMTP id xx2mr17311879pbb.120.1404942264427; Wed, 09 Jul 2014 14:44:24 -0700 (PDT) Received: by 10.70.133.194 with HTTP; Wed, 9 Jul 2014 14:44:24 -0700 (PDT) Date: Wed, 9 Jul 2014 14:44:24 -0700 Message-ID: From: Dave Taht To: "cerowrt-devel@lists.bufferbloat.net" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: [Cerowrt-devel] cerowrt-3.10.44-6 report X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jul 2014 21:44:25 -0000 I have been pounding several cerowrt boxes utterly flat for 13 days now. root@davedesk:~# uptime 20:54:29 up 13 days, 13:28, load average: 0.04, 0.04, 0.04 (well, formerly flat prior to this email) Aside from seeing one kernel trap (see bug #442) for it, it's stayed up on wifi, reliably for 10s of thousands of tests... for me. I have - along the way - collected gigabytes of useless packet captures, crashed every serial dongle I own, the 802.11ac ap I'm working on, windows multiple times, and linux on a pair of laptops, and reduced multiple beaglebones with the edimax 802.11ac to gibbering, crashed hysteria unrecoverable even with a usb serial connection, needing a reflash. but never cerowrt. So I'm happy about that, and depressed about the sad state of wifi on *everything else*. (the rtl8812au driver has to be seen to be believed). I've wasted some time trying to sort through that and glad that the ath10k driver has been getting some serious love of late.... But I've had several reports that cerowrt 3.10.44-6 fails for others with bug 442-like symptoms. It does seem that the one near-constant in all reports are "osx", and "poor signal strength", so over the weekend I finally got a macbook with 802.11ac to drive more tests. This brings up the number of stations I have to 1 windows, 1 mac, 2 linux laptops, 6 beaglebones, and a bunch of APs (that I have been connecting to each other in adhoc and sta mode) to try and blow up EVERYTHING. One useful bit of fallout from all this has been being able to test multi-station wifi performance, which is predictably horrible, but given all the other stuff I've been testing simultaneously, the data is hard to sort through or publish - goal here is to crash stuff, and do forensics, not science. I am encouraged by some of the bugs denton gentry has been fixing over on the ath10k mailing list, and wonder if some of the same things happen on ath9k and other drivers. I'm still pretty convinced the 442 problem is generic to the darn ath9k but being unable to duplicate the problem is a problem. His methods are extreme! and he keeps finding, interesting, subtle problems, with hugely bad side-effects, like this one: http://lists.infradead.org/pipermail/ath10k/2014-July/002606.html Which I'm sure exist, in device drivers everywhere. So, next up, for me - is to keep testing 802.11ac and n on the device I'm getting paid to beat up - while exercising cero as hard as possible (I'm using it to drive the ac box as one example). I will be adding impairments this time to the non-mac boxes, and if things keep working, add impairments to the mac box, while capturing as much as possible. A decision to make is whether or not to refresh 3.10.44-6 with openwrt head. If I don't do it, it will be another 14 days before I stop testing and can refresh, and ietf will be upon me. I really, really, really, really wanted a stable cerowrt release, and then to move on. I'd hoped that 3.10.44-6 would have been it. I've thought about putting out a bug bounty for it, if that would find someone with the wherewithal to nail this !@#! thing to the floor. In the interim, I'd like to make clear to everyone that I regard bug 442 as the only thing holding up a general stable release, and there have been a couple updates to it. http://www.bufferbloat.net/issues/442 and anything you can do to beat up your boxes, while capturing traffic, and the failure event(s), will help. Of HUGE interest is getting raw captures from a wifi monitoring interface on a regular basis from someone (anyone!) experiencing this problem, and thus capture exactly when and why it happens. Not all wifi chips support it wifi monitor mode, but if you install aircrack-ng, enabling it is straightforward: sudo airmon-ng start wlan0 and then wireshark can see the interface and capture/decode raw wifi packets , as can tcpdump -I. I just nuked a bunch of captures and tests to get some disk space back and am setting up the full monty again, while coaxing this mac to have a decent compiler and setup. Actually, I think I'm going to go get a 2TB disk for the monitor box. I incidentally just stumbled across a 1998-2000 history of wireless development I started to write 4 years ago, and, well, I'd forgotten the pain of the 7 months of initial development, and the years of trouble we still had with it after. I really don't enjoy the low level driver stuff. http://www.teklibre.com/~d/elwr/wireless_2.html (_1, _3, _4, etc, it gets up to 9) According to this, http://www.teklibre.com/~d/elwr/emails.html My first documented encounter with the need for aqm and packet scheduling on wireless was: Mon, 19 Oct 1998 19:18:09 --=20 Dave T=C3=A4ht NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_= indecent.article