From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ie0-f182.google.com (mail-ie0-f182.google.com [209.85.223.182]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 0D10821F0F2 for ; Sun, 13 Jan 2013 10:03:24 -0800 (PST) Received: by mail-ie0-f182.google.com with SMTP id s9so4142892iec.41 for ; Sun, 13 Jan 2013 10:03:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=wSXjjjuGfSsxnOEOhZHIH2UEQcFnI4o2gDSzLjeghOk=; b=eD4zvM+wiHYvjfkHWr+X4H7Za+sTCtxTXrBUAzacpe93W01KuN4xjOsJnmmB5pr3CJ zwsO+rc9PAsO9I7LYkE3HD017PfLhm2HP1FKkEVEtHOkd23JTdHZiB5qa88T5iqEIBcq 6ykXHNvyp2I4rUxXnuIvCviamb8h5djJfdudfZclcaQ1qQPoJL2V2TUAhpE388nJvH6n 3p5EDIpzqolC5EBpbrepKNBzodP55IKWVhACbHXYJelULdD24vElyfAtlx6yf0BRK8xx +EBn3di+Nbtgm0WcuInOOIuMiuOEIb+kokd/LlUuWsdRHQqSxHNc1gACrMTbQtnSXQnO kwAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-gm-message-state; bh=wSXjjjuGfSsxnOEOhZHIH2UEQcFnI4o2gDSzLjeghOk=; b=V0JIdy7H033sghutUyMrtQO9NHw+RGTWWnpubI/IYRynS1AT7vicFdu5M13Ir1kKe8 AFjDpBtU4Q4RjZWKTsL7sTeZQmLOlw3T6e/jI8VPBMXGCU8S8zI4/YwfcypXaX/RgHNn XMZ8Gd9/cDrC9j93/hJ1LT2BXF/7TtNUsHwy15LJQvkd9UxMDX4zbg6qpjIJQMpbydYL PfcYxp0gfXocNE8i7whIETRAxHIxGXjvEmKh8RluycTNqoIW86Xp3Y0E0SIAV9SlFnbY qvwwwhSvz/5gzVO4idox3/rxCpKVJHxhxSFzB8SiAjItW3skUx8bmvXMpWEYEAlrcSDK 6usQ== MIME-Version: 1.0 Received: by 10.50.11.130 with SMTP id q2mr4747230igb.99.1358100204082; Sun, 13 Jan 2013 10:03:24 -0800 (PST) Received: by 10.50.161.227 with HTTP; Sun, 13 Jan 2013 10:03:23 -0800 (PST) In-Reply-To: References: Date: Sun, 13 Jan 2013 10:03:23 -0800 Message-ID: From: Eric Dumazet To: Ketan Kulkarni Content-Type: multipart/alternative; boundary=e89a8f5032643bd55204d32f58b6 X-Gm-Message-State: ALoCoQns7h7ecm2jCJWZKRQycUMvFslcYoIhFllpNxu4LanmT0vvhDGWUjzGE2J3Sonz6r3p578r5Y83lJdJ2oUvppiqPJHfgJXcyoj67ih4BC20xkqp3aK4Jgk4lIoMdP9xNt2HwYfeGduvemuUbXah2dEbysvFXWu0OkgmZjM+j+A93Zd2p+BSl1mRBTL208BvBxssNNVIk3JluldNLBZBTHdmhsRuHQ== Cc: Jerry Chu , Yuchung Cheng , cerowrt-devel Subject: Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1 X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Jan 2013 18:03:25 -0000 --e89a8f5032643bd55204d32f58b6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I suspect a bug in the spin_is_locked() implementation on your arch, as he socket lock should be held at this point. On Sun, Jan 13, 2013 at 9:01 AM, Ketan Kulkarni wrote: > I could get a chance to get the backtrace from serial port. I didnt do th= e > kgdb session yet. > To iterate, the crash occurs on TFO server on mips platform. > > The call trace looks like this > [ 1024.530000] Call Trace: [ 1024.530000] [<801fc7f4>] > reqsk_fastopen_remove+0x30/0x17c [ 1024.530000] [<8024a36c>] > tcp_rcv_state_process+0x7b4/0xc28 [ 1024.530000] [<802516ec>] > tcp_v4_do_rcv+0x21c/0x274 [ 1024.530000] [<80253c74>] > tcp_v4_rcv+0x5b4/0x974 [ 1024.530000] [<802320f0>] > ip_local_deliver_finish+0x168/0x29c [ 1024.530000] [<80207100>] > __netif_receive_skb+0x63c/0x6c0 [ 1024.530000] [] > ieee80211_deliver_skb+0x1b8/0x220 [mac80211] [ 1024.530000] [] > ieee80211_rx_handlers.part.12+0x1654/0x23e0 [mac80211] [ 1024.530000] > [] ieee80211_prepare_and_rx_handle+0xa6c/0xaf0 [mac80211] [ > 1024.530000] [] ieee80211_rx+0x810/0x8d8 [mac80211] [ > 1024.530000] [] ath_rx_tasklet+0xf4c/0x10a4 [ath9k] [ > 1024.530000] [] ath9k_tasklet+0x104/0x174 [ath9k] [ 1024.530000= ] > [<800793b8>] tasklet_action+0x78/0xc8 [ 1024.530000] [<80078c08>] > __do_softirq+0xb0/0x184 [ 1024.530000] [<80078d8c>] do_softirq+0x48/0x68 = [ > 1024.530000] [<80078fa8>] irq_exit+0x4c/0x7c [ 1024.530000] [<8006330c>] > ret_from_irq+0x0/0x4 [ 1024.530000] [ 1024.530000] Code: 8e510208 30d300f= f > 2c420001 <00028036> 0c01e2a7 ac80048c 8e220008 2442ffff ae220008 [ > 1024.940000] ---[ end trace a47ff22dd20a96c1 ]---[ 1024.950000] Kernel > panic - not syncing: Fatal exception in interrupt > > I suspect this is the line responsible for this crash > > void reqsk_fastopen_remove(struct sock *sk, struct request_sock *req, boo= l > reset) { struct sock *lsk =3D tcp_rsk(req)->listener; struct fastopen_que= ue > *fastopenq =3D inet_csk(lsk)->icsk_accept_ queue.fastopenq; > > >>>>> BUG_ON(!spin_is_locked(&sk-> sk_lock.slock) && > !sock_owned_by_user(sk)); > > tcp_sk(sk)->fastopen_rsk =3D NULL; spin_lock_bh(&fastopenq->lock); > fastopenq->qlen--; tcp_rsk(req)->listener =3D NULL; > > Please see more details here > http://www.bufferbloat.net/issues/418#change-1706 > > Thanks, > Ketan > > On Jan 6, 2013 12:43 AM, "Ketan Kulkarni" wrote: > > > > Disabling ECN on cero box has no effect. > > The box crashed with with ECN disabled. > > Also tried enabling ECN on x86 and it didnt crash in either case. The > > tcpdump on cero lo is updated at - > > https://www.bufferbloat.net/issues/418#change-1703 > > It is exactly similar to the previously attached "lo_capture.txt" but > > with ECN disabled. > > > > I might try getting serial cable on Sunday to get the crash details. > > Till then probably I cannot provide the crash logs as logread/dmesg > > does not print anything. > > > > Thanks, > > Ketan > > > > On Sat, Jan 5, 2013 at 8:32 AM, Ketan Kulkarni > wrote: > > > Without TFO all worked fine. > > > The problem is when tfo server is on cero box. > > > I will try both ECN on on laptop and disabling ECN on cero with TFO > on. Will > > > report the behavior seen. > > > > > > Thanks, > > > Ketan. > > > > > > On Jan 5, 2013 7:50 AM, "Yuchung Cheng" wrote: > > >> > > >> On Fri, Jan 4, 2013 at 5:59 PM, Ketan Kulkarni > wrote: > > >> > Well, I was trying polipo server on cero box and httping from > laptop. On > > >> > both the boxes I set 3 in tcp_fastopen. > > >> > > > >> > The panic is seen only when server is on cero box. > > >> > If I run server on my laptop and httping from cero all TFO > connections > > >> > are > > >> > successful. > > >> > So I doubt its the only problem is SYN+DATA. > > >> Just to confirm: you meant the problem is SYN/data processing on the > > >> server side? > > >> > > >> Maybe we hit some ECN / TFO bug. Some crash log would be great. Than= ks > > >> for trying TFO! > > >> > > >> > > > >> > Unfortunately I don't have the serial cable right now, and logread > or > > >> > dmesg > > >> > didn't print any logs before the cero router restarted. > > >> > > > >> > Attached is the tcpdump capture on lo when client and server both > run on > > >> > cero box. > > >> > HTH! > > >> > > > >> > If you (or anyone) can suggest more diagnostics, I will be glad to > > >> > provide. > > >> > > > >> > On Jan 5, 2013 2:49 AM, "Jerry Chu" wrote: > > >> >> > > >> >> +ycheng > > >> >> > > >> >> > > >> >> On Fri, Jan 4, 2013 at 1:11 PM, Dave Taht > wrote: > > >> >>> > > >> >>> Hmm. I would lean towards there being an issue with the new > (freshly > > >> >>> ported forward to 3.7.1) unaligned checksum code for mips based = on > > >> >>> what you say here. Or an offload... > > >> >>> > > >> >>> As for the 239.x multicast issue, hmm... separate issue entirely= . > > >> >>> Probably... > > >> >>> > > >> >>> And then there's TFO. I note that in order to use it properly yo= u > need > > >> >>> to turn it on in proc. Last I remember that was > > >> >>> > > >> >>> echo 3 > /proc/sys/net/ipv4/tcp_fastopen > > >> >> > > >> >> > > >> >> Correct - to enable the normal use of TFO for both client and > server. > > >> >> There are other flags for advanced usage: > > >> >> /* Bit Flags for sysctl_tcp_fastopen */ > > >> >> #define TFO_CLIENT_ENABLE 1 > > >> >> #define TFO_SERVER_ENABLE 2 > > >> >> #define TFO_CLIENT_NO_COOKIE 4 /* Send data-in-SYN w/o cookie = */ > > >> >> > > >> >> /* Process SYN data but skip cookie validation */ > > >> >> #define TFO_SERVER_COOKIE_NOT_CHKED 0x100 > > >> >> /* Accept SYN data w/o any cookie option */ > > >> >> #define TFO_SERVER_COOKIE_NOT_REQD 0x200 > > >> >> > > >> >> /* Force enable TFO on all listeners, i.e., not requiring the > > >> >> * TCP_FASTOPEN socket option. SOCKOPT1/2 determine how to set > > >> >> max_qlen. > > >> >> */ > > >> >> #define TFO_SERVER_WO_SOCKOPT1 0x400 > > >> >> #define TFO_SERVER_WO_SOCKOPT2 0x800 > > >> >> /* Always create TFO child sockets on a TFO listener even when > > >> >> * cookie/data not present. (For testing purpose!) > > >> >> */ > > >> >> #define TFO_SERVER_ALWAYS 0x1000 > > >> >> > > >> >>> > > >> >>> However that's an old memory and there is this tcp_fastopen_key > file I > > >> >>> don't know anything about yet (this is such bleeding edge stuff!= ) > > >> >>> > > >> >>> ... and with tcp_fastopen disabled things should still work > right... > > >> >>> so I'm thinking something else is busted in the stack. > > >> >>> > > >> >>> I've also observed a dns slowdown in what I've been testing but > hadn't > > >> >>> dug into packet dumps. (and was assuming, until now, it was due > to me > > >> >>> fiddling with ULAs inside the network) Thanks for digging this > deep! > > >> >>> > > >> >>> I never said this first attempt at 3.7 for cero was going to be > > >> >>> perfect, but we've entered a new age of subtle problems here. > > >> >>> > > >> >>> I strongly suggest nobody else try this dev build as a default > gw, and > > >> >>> that the TFO folk ignore the noise for now. > > >> >> > > >> >> > > >> >> SG. > > >> >> > > >> >> Jerry > > >> >> > > >> >>> > > >> >>> > > >> >>> I just got a 3.7.1 box built on x86_64 so as to a/b some capture= s. > > >> >>> Regrettably I'm short on time through the weekend... > > >> >>> > > >> >>> On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak > > >> >>> > > >> >>> wrote: > > >> >>> > I am seeing something strange here, with polipo related to TFO > but > > >> >>> > also > > >> >>> > DNS. > > >> >>> > When I just took 3.7.1-1 and set my windows 7 laptop to use > > >> >>> > gw.home.lan:8123 > > >> >>> > as http proxy it didn't work. What I observed was: > > >> >>> > A) after quite a while polipo's response to browser was 504 Ho= st > > >> >>> > www.osnews.com lookup failed: Timeout > > >> >>> > b) this error in ssh console: Host osnews.com lookup failed: > Timeout > > >> >>> > (131072) > > >> >>> > c) Disabling TFO by adding option useTCPFastOpen 'false' to > config > > >> >>> > 'polipo' > > >> >>> > 'general' works around the problem > > >> >>> > d) Alternatively, you can keep TFO enabled in polipo but chang= e > > >> >>> > option > > >> >>> > 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!) > > >> >>> > This is very weird, because TFO is TCP and the DNS queries > fired off > > >> >>> > by > > >> >>> > polipo are UDP: > > >> >>> > root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i > lo > > >> >>> > 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags > [DF], > > >> >>> > proto > > >> >>> > UDP (17), length 60) > > >> >>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> > 0xd17f!] > > >> >>> > 55396+ A? > > >> >>> > www.osnews.com. (32) > > >> >>> > 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x...= .. > > >> >>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d= .. > > >> >>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.o= sn > > >> >>> > 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com..... > > >> >>> > 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags > [DF], > > >> >>> > proto > > >> >>> > UDP (17), length 60) > > >> >>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> > 0xd164!] > > >> >>> > 55396+ > > >> >>> > AAAA? www.osnews.com. (32) > > >> >>> > 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x...= .. > > >> >>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d= .. > > >> >>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.o= sn > > >> >>> > 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com..... > > >> >>> > 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF= ], > > >> >>> > proto > > >> >>> > UDP > > >> >>> > (17), length 123) > > >> >>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> > 0x5f73!] > > >> >>> > 55396 > > >> >>> > q: > > >> >>> > A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A > 74.86.31.159 ns: > > >> >>> > osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] N= S > > >> >>> > ns1.swelter.net. (95) > > >> >>> > 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@ > .@. > >> >>> > 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d= .. > > >> >>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.o= sn > > >> >>> > 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.......= .. > > >> >>> > 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV....= .. > > >> >>> > 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.s= we > > >> >>> > 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net......= .. > > >> >>> > 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@ > > >> >>> > 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF= ], > > >> >>> > proto > > >> >>> > UDP > > >> >>> > (17), length 135) > > >> >>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> > 0x8ecb!] > > >> >>> > 55396 > > >> >>> > q: > > >> >>> > AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA > > >> >>> > 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.ne= t > ., > > >> >>> > osnews.com. [29m3s] NS ns2.swelter.net. (107) > > >> >>> > 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@ > .@. > >> >>> > 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d= .. > > >> >>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.o= sn > > >> >>> > 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.......= .. > > >> >>> > 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&.....= .b > > >> >>> > 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ..............= .. > > >> >>> > 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelt= er > > >> >>> > 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net..........= .. > > >> >>> > 0x0080: 0603 6e73 32c0 4c ..ns2.L > > >> >>> > This is the only DNS traffic I saw during the attempts. The > tcpdumps > > >> >>> > have > > >> >>> > udp bad checksum but when I disabled TFO in polipo, the UDP > where > > >> >>> > still > > >> >>> > bad > > >> >>> > checksum but they worked. > > >> >>> > Really weird. > > >> >>> > p.s. UPNP still works for port forwarding negotiation as it di= d > in > > >> >>> > 3.6.11-4 > > >> >>> > I still couldn't get the UPNP/SSDP broadcasts (udp to > > >> >>> > 239.255.255.250) > > >> >>> > to > > >> >>> > being forwarded between se00 and sw00/sw10. Last time it worke= d > was > > >> >>> > ~3.3.8. > > >> >>> > I'm starting not to question why it doesn't work, I'm starting > to > > >> >>> > wonder why > > >> >>> > it did work then ;-) > > >> >>> > Regards, > > >> >>> > Maciej > > >> >>> > On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht > > >> >>> > wrote: > > >> >>> >> > > >> >>> >> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet < > edumazet@google.com> > > >> >>> >> wrote: > > >> >>> >> > Sorry, could you give us a copy of the panic stack trace ? > > >> >>> >> > > >> >>> >> I will get a serial console up on a wndr3800 by sunday. (sorr= y, > > >> >>> >> just > > >> >>> >> landed in california, am in disarray) > > >> >>> >> > > >> >>> >> The latest dev build of cero for the wndr3800 and wndr3700v2 > is at: > > >> >>> >> > > >> >>> >> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1= / > > >> >>> >> > > >> >>> >> -- > > >> >>> >> Dave T=E4ht > > >> >>> >> > > >> >>> >> Fixing bufferbloat with cerowrt: > > >> >>> >> http://www.teklibre.com/cerowrt/subscribe.html > > >> >>> >> _______________________________________________ > > >> >>> >> Cerowrt-devel mailing list > > >> >>> >> Cerowrt-devel@lists.bufferbloat.net > > >> >>> >> https://lists.bufferbloat.net/listinfo/cerowrt-devel > > >> >>> > > > >> >>> > > > >> >>> > > >> >>> > > >> >>> > > >> >>> -- > > >> >>> Dave T=E4ht > > >> >>> > > >> >>> Fixing bufferbloat with cerowrt: > > >> >>> http://www.teklibre.com/cerowrt/subscribe.html > > >> >> > > >> >> > > >> > > --e89a8f5032643bd55204d32f58b6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I suspect a bug in the = spin_is_locked() implementation on your arch, as he socket lock should be h= eld at this point.

=


On Sun, Jan 13, 2013 at 9:01 AM, Ketan K= ulkarni <ketkulka@gmail.com> wrote:

I could get a chance to get the backtrace from serial port. I didnt do t= he kgdb session yet.
To iterate, the crash occurs on TFO server on mips platform.

The call trace looks like this
[ 1024.530000] Call Trace: [ 1024.530000] [<801fc7f4>] reqsk_fastopen= _remove+0x30/0x17c [ 1024.530000] [<8024a36c>] tcp_rcv_state_process+= 0x7b4/0xc28 [ 1024.530000] [<802516ec>] tcp_v4_do_rcv+0x21c/0x274 [ 1= 024.530000] [<80253c74>] tcp_v4_rcv+0x5b4/0x974 [ 1024.530000] [<8= 02320f0>] ip_local_deliver_finish+0x168/0x29c [ 1024.530000] [<802071= 00>] __netif_receive_skb+0x63c/0x6c0 [ 1024.530000] [<c060b2e8>] i= eee80211_deliver_skb+0x1b8/0x220 [mac80211] [ 1024.530000] [<c060cc70>= ;] ieee80211_rx_handlers.part.12+0x1654/0x23e0 [mac80211] [ 1024.530000] [&= lt;c060e468>] ieee80211_prepare_and_rx_handle+0xa6c/0xaf0 [mac80211] [ 1= 024.530000] [<c060ecfc>] ieee80211_rx+0x810/0x8d8 [mac80211] [ 1024.5= 30000] [<c078651c>] ath_rx_tasklet+0xf4c/0x10a4 [ath9k] [ 1024.530000= ] [<c078437c>] ath9k_tasklet+0x104/0x174 [ath9k] [ 1024.530000] [<= 800793b8>] tasklet_action+0x78/0xc8 [ 1024.530000] [<80078c08>] __= do_softirq+0xb0/0x184 [ 1024.530000] [<80078d8c>] do_softirq+0x48/0x6= 8 [ 1024.530000] [<80078fa8>] irq_exit+0x4c/0x7c [ 1024.530000] [<= 8006330c>] ret_from_irq+0x0/0x4 [ 1024.530000] [ 1024.530000] Code: 8e51= 0208 30d300ff 2c420001 <00028036> 0c01e2a7 ac80048c 8e220008 2442ffff= ae220008 [ 1024.940000] ---[ end trace a47ff22dd20a96c1 ]---[ 1024.950000]= Kernel panic - not syncing: Fatal exception in interrupt

I suspect this is the line responsible for this crash

void reqsk_fastopen_remove(struct sock *sk, struct request_sock *req, bo= ol reset) { struct sock *lsk =3D tcp_rsk(req)->listener; struct fastopen= _queue *fastopenq =3D inet_csk(lsk)->icsk_accept_ queue.fastopenq;

>>>>> BUG_ON(!spin_is_locked(&sk-> sk_lock.slock) = && !sock_owned_by_user(sk));

tcp_sk(sk)->fastopen_rsk =3D NULL; spin_lock_bh(&fastopenq->lo= ck); fastopenq->qlen--; tcp_rsk(req)->listener =3D NULL;

Please see more details here
http://www.bufferbloat.net/issues/418#change-1706

Thanks,
Ketan

On Jan 6, 2013 12:43 AM, "Ketan Kulkarni" <ketkulka@gmail.com> wrote: >
> Disabling ECN on cero box has no effect.
> The box crashed with with ECN disabled.
> Also tried enabling ECN on x86 and it didnt crash in either case. The<= br> > tcpdump on cero lo is updated at -
> https://www.bufferbloat.net/issues/418#change-1703
> It is exactly similar to the previously attached "lo_capture.txt&= quot; but
> with ECN disabled.
>
> I might try getting serial cable on Sunday to get the crash details. > Till then probably I cannot provide the crash logs as logread/dmesg > does not print anything.
>
> Thanks,
> Ketan
>
> On Sat, Jan 5, 2013 at 8:32 AM, Ketan Kulkarni <ketkulka@gmail.com> wrote:
> > Without TFO all worked fine.
> > The problem is when tfo server is on cero box.
> > I will try both ECN on on laptop and disabling ECN on cero with T= FO on. Will
> > report the behavior seen.
> >
> > Thanks,
> > Ketan.
> >
> > On Jan 5, 2013 7:50 AM, "Yuchung Cheng" <ycheng@google.com> wrote= :
> >>
> >> On Fri, Jan 4, 2013 at 5:59 PM, Ketan Kulkarni <ketkulka@gmail.com> wr= ote:
> >> > Well, I was trying polipo server on cero box and httping= from laptop. On
> >> > both the boxes I set 3 in tcp_fastopen.
> >> >
> >> > The panic is seen only when server is on cero box.
> >> > If I run server on my laptop and httping from cero all T= FO connections
> >> > are
> >> > successful.
> >> > So I doubt its the only problem is SYN+DATA.
> >> Just to confirm: you meant the problem is SYN/data processing= on the
> >> server side?
> >>
> >> Maybe we hit some ECN / TFO bug. Some crash log would be grea= t. Thanks
> >> for trying TFO!
> >>
> >> >
> >> > Unfortunately I don't have the serial cable right no= w, and logread or
> >> > dmesg
> >> > didn't print any logs before the cero router =A0rest= arted.
> >> >
> >> > Attached is the tcpdump capture on lo when client and se= rver both run on
> >> > cero box.
> >> > HTH!
> >> >
> >> > If you (or anyone) can suggest more diagnostics, I will = be glad to
> >> > provide.
> >> >
> >> > On Jan 5, 2013 2:49 AM, "Jerry Chu" <hkchu@google.com> wr= ote:
> >> >>
> >> >> +ycheng
> >> >>
> >> >>
> >> >> On Fri, Jan 4, 2013 at 1:11 PM, Dave Taht <dave.taht@gmail.com&= gt; wrote:
> >> >>>
> >> >>> Hmm. I would lean towards there being an issue w= ith the new (freshly
> >> >>> ported forward to 3.7.1) unaligned checksum code= for mips based on
> >> >>> what you say here. Or an offload...
> >> >>>
> >> >>> As for the 239.x multicast issue, hmm... separat= e issue entirely.
> >> >>> Probably...
> >> >>>
> >> >>> And then there's TFO. I note that in order t= o use it properly you need
> >> >>> to turn it on in proc. Last I remember that was<= br> > >> >>>
> >> >>> echo 3 > /proc/sys/net/ipv4/tcp_fastopen
> >> >>
> >> >>
> >> >> Correct - to enable the normal use of TFO for both c= lient and server.
> >> >> There are other flags for advanced usage:
> >> >> =A0/* Bit Flags for sysctl_tcp_fastopen */
> >> >> #define TFO_CLIENT_ENABLE =A0 =A0 =A0 1
> >> >> #define TFO_SERVER_ENABLE =A0 =A0 =A0 2
> >> >> #define TFO_CLIENT_NO_COOKIE =A0 =A04 /* Send data-i= n-SYN w/o cookie */
> >> >>
> >> >> /* Process SYN data but skip cookie validation */ > >> >> #define TFO_SERVER_COOKIE_NOT_CHKED =A0 =A0 0x100 > >> >> /* Accept SYN data w/o any cookie option */
> >> >> #define TFO_SERVER_COOKIE_NOT_REQD =A0 =A0 =A00x200<= br> > >> >>
> >> >> /* Force enable TFO on all listeners, i.e., not requ= iring the
> >> >> =A0* TCP_FASTOPEN socket option. SOCKOPT1/2 determin= e how to set
> >> >> max_qlen.
> >> >> =A0*/
> >> >> #define TFO_SERVER_WO_SOCKOPT1 =A00x400
> >> >> #define TFO_SERVER_WO_SOCKOPT2 =A00x800
> >> >> /* Always create TFO child sockets on a TFO listener= even when
> >> >> =A0* cookie/data not present. (For testing purpose!)=
> >> >> =A0*/
> >> >> #define TFO_SERVER_ALWAYS =A0 =A0 =A0 0x1000
> >> >>
> >> >>>
> >> >>> However that's an old memory and there is th= is tcp_fastopen_key file I
> >> >>> don't know anything about yet (this is such = bleeding edge stuff!)
> >> >>>
> >> >>> ... and with tcp_fastopen disabled things should= still work right...
> >> >>> so I'm thinking something else is busted in = the stack.
> >> >>>
> >> >>> I've also observed a dns slowdown in what I&= #39;ve been testing but hadn't
> >> >>> dug into packet dumps. (and was assuming, until = now, it was due to me
> >> >>> fiddling with ULAs inside the network) Thanks fo= r digging this deep!
> >> >>>
> >> >>> I never said this first attempt at 3.7 for cero = was going to be
> >> >>> perfect, but we've entered a new age of subt= le problems here.
> >> >>>
> >> >>> I strongly suggest nobody else try this dev buil= d as a default gw, and
> >> >>> that the TFO folk ignore the noise for now.
> >> >>
> >> >>
> >> >> SG.
> >> >>
> >> >> Jerry
> >> >>
> >> >>>
> >> >>>
> >> >>> I just got a 3.7.1 box built on x86_64 so as to = a/b some captures.
> >> >>> Regrettably I'm short on time through the we= ekend...
> >> >>>
> >> >>> On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysia= k
> >> >>> <maciej@soltysiak.com>
> >> >>> wrote:
> >> >>> > I am seeing something strange here, with po= lipo related to TFO but
> >> >>> > also
> >> >>> > DNS.
> >> >>> > When I just took 3.7.1-1 and set my windows= 7 laptop to use
> >> >>> > gw.home.lan:8123
> >> >>> > as http proxy it didn't work. What I ob= served was:
> >> >>> > A) after quite a while polipo's respons= e to browser was 504 Host
> >> >>> > www.osnews.com lookup failed: Timeout
> >> >>> > b) this error in ssh console: Host osnews.com lookup failed: Time= out
> >> >>> > (131072)
> >> >>> > c) Disabling TFO by adding option useTCPFas= tOpen 'false' to config
> >> >>> > 'polipo'
> >> >>> > 'general' works around the problem<= br> > >> >>> > d) Alternatively, you can keep TFO enabled = in polipo but change
> >> >>> > option
> >> >>> > 'dnsUseGethostbyname' from 'rel= uctantly' to 'true' (!)
> >> >>> > This is very weird, because TFO is TCP and = the DNS queries fired off
> >> >>> > by
> >> >>> > polipo are UDP:
> >> >>> > root@OpenWrt:/tmp/log# tcpdump -n -v -vv -v= vv -x -X -s 1500 -i lo
> >> >>> > 20:21:56.160245 IP (tos 0x0, ttl 64, id 501= 29, offset 0, flags [DF],
> >> >>> > proto
> >> >>> > UDP (17), length 60)
> >> >>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp= cksum 0xfe3b -> 0xd17f!]
> >> >>> > 55396+ A?
> >> >>> > www.osnews.com. (32)
> >> >>> > 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 = 0001 E..<..@.@.x.....
> >> >>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 = 0100 .......5.(.;.d..
> >> >>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f = 736e .........www.osn
> >> >>> > 0x0030: 6577 7303 636f 6d00 0001 0001 ews.c= om.....
> >> >>> > 20:21:56.160319 IP (tos 0x0, ttl 64, id 501= 30, offset 0, flags [DF],
> >> >>> > proto
> >> >>> > UDP (17), length 60)
> >> >>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp= cksum 0xfe3b -> 0xd164!]
> >> >>> > 55396+
> >> >>> > AAAA? www.osnews.com. (32)
> >> >>> > 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 = 0001 E..<..@.@.x.....
> >> >>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 = 0100 .......5.(.;.d..
> >> >>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f = 736e .........www.osn
> >> >>> > 0x0030: 6577 7303 636f 6d00 001c 0001 ews.c= om.....
> >> >>> > 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, = offset 0, flags [DF],
> >> >>> > proto
> >> >>> > UDP
> >> >>> > (17), length 123)
> >> >>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp= cksum 0xfe7a -> 0x5f73!]
> >> >>> > 55396
> >> >>> > q:
> >> >>> > A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
> >> >>> > osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
> >> >>> > ns1.swelter.net. (95)
> >> >>> > 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 = 0001 E..{..@.@.<p....
> >> >>> > 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 = 8180 .....5...g.z.d..
> >> >>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f = 736e .........www.osn
> >> >>> > 0x0030: 6577 7303 636f 6d00 0001 0001 c00c = 0001 ews.com.........
> >> >>> > 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 = 0002 ........JV......
> >> >>> > 0x0050: 0001 0000 06cf 0011 036e 7332 0773 = 7765 .........ns2.swe
> >> >>> > 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 = 0100 lter.net........
> >> >>> > 0x0070: 0006 cf00 0603 6e73 31c0 40 ......n= s1.@
> >> >>> > 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, = offset 0, flags [DF],
> >> >>> > proto
> >> >>> > UDP
> >> >>> > (17), length 135)
> >> >>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp= cksum 0xfe86 -> 0x8ecb!]
> >> >>> > 55396
> >> >>> > q:
> >> >>> > AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
> >> >>> > 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
> >> >>> > osnews.com. [29m3s] NS ns2.swelter.net. (107)
> >> >>> > 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 = 0001 E.....@.@.<d....
> >> >>> > 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 = 8180 .....5...s...d..
> >> >>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f = 736e .........www.osn
> >> >>> > 0x0030: 6577 7303 636f 6d00 001c 0001 c00c = 001c ews.com.........
> >> >>> > 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 = 0062 ........&......b
> >> >>> > 0x0050: 0000 0000 0000 0003 c010 0002 0001 = 0000 ................
> >> >>> > 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 = 6572 .....ns1.swelter
> >> >>> > 0x0070: 036e 6574 00c0 1000 0200 0100 0006 = cf00 .net............
> >> >>> > 0x0080: 0603 6e73 32c0 4c ..ns2.L
> >> >>> > This is the only DNS traffic I saw during t= he attempts. The tcpdumps
> >> >>> > have
> >> >>> > udp bad checksum but when I disabled TFO in= polipo, the UDP where
> >> >>> > still
> >> >>> > bad
> >> >>> > checksum but they worked.
> >> >>> > Really weird.
> >> >>> > p.s. UPNP still works for port forwarding n= egotiation as it did in
> >> >>> > 3.6.11-4
> >> >>> > I still couldn't get the UPNP/SSDP broa= dcasts (udp to
> >> >>> > 239.255.255.250)
> >> >>> > to
> >> >>> > being forwarded between se00 and sw00/sw10.= Last time it worked was
> >> >>> > ~3.3.8.
> >> >>> > I'm starting not to question why it doe= sn't work, I'm starting to
> >> >>> > wonder why
> >> >>> > it did work then ;-)
> >> >>> > Regards,
> >> >>> > Maciej
> >> >>> > On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht &= lt;dave.taht@gmail= .com>
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> On Fri, Jan 4, 2013 at 9:27 AM, Eric Du= mazet <edumazet= @google.com>
> >> >>> >> wrote:
> >> >>> >> > Sorry, could you give us a copy of= the panic stack trace ?
> >> >>> >>
> >> >>> >> I will get a serial console up on a wnd= r3800 by sunday. (sorry,
> >> >>> >> just
> >> >>> >> landed in california, am in disarray) > >> >>> >>
> >> >>> >> The latest dev build of cero for the wn= dr3800 and wndr3700v2 is at:
> >> >>> >>
> >> >>> >> http://snapon.lab.buf= ferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
> >> >>> >>
> >> >>> >> --
> >> >>> >> Dave T=E4ht
> >> >>> >>
> >> >>> >> Fixing bufferbloat with cerowrt:
> >> >>> >> http://www.teklibre.com/cerowrt/subsc= ribe.html
> >> >>> >> _______________________________________= ________
> >> >>> >> Cerowrt-devel mailing list
> >> >>> >> Cerowrt-devel@lists.bufferbloat.net > >> >>> >> https://lists.bufferbloat.net/l= istinfo/cerowrt-devel
> >> >>> >
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Dave T=E4ht
> >> >>>
> >> >>> Fixing bufferbloat with cerowrt:
> >> >>> http://www.teklibre.com/cerowrt/subscribe.html=
> >> >>
> >> >>
> >> >


--e89a8f5032643bd55204d32f58b6--