From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f178.google.com (mail-ob0-f178.google.com [209.85.214.178]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 4953B21F17F for ; Fri, 4 Jan 2013 18:20:51 -0800 (PST) Received: by mail-ob0-f178.google.com with SMTP id eh20so15471757obb.9 for ; Fri, 04 Jan 2013 18:20:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=W0B+hDE557rxJMHAFeROdy+RxJtWnPmf2Jn8Dyr/ft0=; b=DqnBMEs1Q6d+Pb/xNPHw1BWUZNMv7UMkU1Yz+cM4gZf1ejYoFFdni8G+6NYUNxPMcd OfA3gHZNbARG42HJUPV0Jq3QfwuW7dRrJ79mClwjNi/xoQfzeyPAtjQnqbinuRUVmcDT 0UXrmsLk6bVztWLvN7O4+NJL6nSTNKUzwk0/nOK4dtdz8Etk+TKSJ0XZOPhpMnTQDvkS 2ZQ4U/cpJLOHw1YkS12JqLw4LqVWGIGFDFuYHG1pEcEwggZqj0V4epN70RwehnE0+4eM 6AW6oI8GGDVthtT8BfzLLcDrSxfv6RI//2LvTFpojCd7nQ57GPoDW7oner5dYlO7u2kC CwCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding:x-gm-message-state; bh=W0B+hDE557rxJMHAFeROdy+RxJtWnPmf2Jn8Dyr/ft0=; b=Q8+JIRbeDTKxypi4wJoja1JUim8asVSSgOFiEVqz7UmIHg5+mJFJzO/ZPXDdV9HjNt KYvTEyZIPuYKVyXE2iw6blMz7sho9OSddLGToJxmfV8NWavTE5p179qxhgcirU33Uk7I WfDck37lBpjR2ajiqfq97ybJ/EWxJBSAr83yImA3ApqJR63EHugQMd432tVW2WbSBx/5 /+ExZMSL/M6mvAKTjQDyMEZkyFHok/9qe9TBEQakWLr3LgF50sj0+4hExrxXHIl6/XWM P29W4Y+TS7RXEizpJra7GSrGXCQh2bgFW3SgBDYJoElsUxsaw8slgS1uy/S9ft6Pfn3g 3Rpw== Received: by 10.60.169.140 with SMTP id ae12mr30292999oec.52.1357352450176; Fri, 04 Jan 2013 18:20:50 -0800 (PST) MIME-Version: 1.0 Received: by 10.76.82.39 with HTTP; Fri, 4 Jan 2013 18:20:29 -0800 (PST) In-Reply-To: References: From: Yuchung Cheng Date: Fri, 4 Jan 2013 18:20:29 -0800 Message-ID: To: Ketan Kulkarni Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQnF7A48yzO5myJCkDQMXzjLxkKPnY4Yg3e+mq4reYuuwuiI4Hpq7zAEUDzTf8Zz7r/pnps63EauSPHjtP6BNucFT8FLM4NB9cAy88ac5Tp942aFSosBGqO63Pzu/Wa7nUU/we0iEncn4MjC9Q2ncQ+JCkAhDi9WZJKGFLKi0nOsChRraV9/8XzRTego0wYQB1CzmUNTmRUmUcA42m6j1aKNnb0Nog== X-Mailman-Approved-At: Fri, 04 Jan 2013 19:38:50 -0800 Cc: Jerry Chu , Eric Dumazet , cerowrt-devel Subject: Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1 X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 05 Jan 2013 02:20:51 -0000 On Fri, Jan 4, 2013 at 5:59 PM, Ketan Kulkarni wrote: > Well, I was trying polipo server on cero box and httping from laptop. On > both the boxes I set 3 in tcp_fastopen. > > The panic is seen only when server is on cero box. > If I run server on my laptop and httping from cero all TFO connections ar= e > successful. > So I doubt its the only problem is SYN+DATA. Just to confirm: you meant the problem is SYN/data processing on the server side? Maybe we hit some ECN / TFO bug. Some crash log would be great. Thanks for trying TFO! > > Unfortunately I don't have the serial cable right now, and logread or dme= sg > didn't print any logs before the cero router restarted. > > Attached is the tcpdump capture on lo when client and server both run on > cero box. > HTH! > > If you (or anyone) can suggest more diagnostics, I will be glad to provid= e. > > On Jan 5, 2013 2:49 AM, "Jerry Chu" wrote: >> >> +ycheng >> >> >> On Fri, Jan 4, 2013 at 1:11 PM, Dave Taht wrote: >>> >>> Hmm. I would lean towards there being an issue with the new (freshly >>> ported forward to 3.7.1) unaligned checksum code for mips based on >>> what you say here. Or an offload... >>> >>> As for the 239.x multicast issue, hmm... separate issue entirely. >>> Probably... >>> >>> And then there's TFO. I note that in order to use it properly you need >>> to turn it on in proc. Last I remember that was >>> >>> echo 3 > /proc/sys/net/ipv4/tcp_fastopen >> >> >> Correct - to enable the normal use of TFO for both client and server. >> There are other flags for advanced usage: >> /* Bit Flags for sysctl_tcp_fastopen */ >> #define TFO_CLIENT_ENABLE 1 >> #define TFO_SERVER_ENABLE 2 >> #define TFO_CLIENT_NO_COOKIE 4 /* Send data-in-SYN w/o cookie */ >> >> /* Process SYN data but skip cookie validation */ >> #define TFO_SERVER_COOKIE_NOT_CHKED 0x100 >> /* Accept SYN data w/o any cookie option */ >> #define TFO_SERVER_COOKIE_NOT_REQD 0x200 >> >> /* Force enable TFO on all listeners, i.e., not requiring the >> * TCP_FASTOPEN socket option. SOCKOPT1/2 determine how to set max_qlen. >> */ >> #define TFO_SERVER_WO_SOCKOPT1 0x400 >> #define TFO_SERVER_WO_SOCKOPT2 0x800 >> /* Always create TFO child sockets on a TFO listener even when >> * cookie/data not present. (For testing purpose!) >> */ >> #define TFO_SERVER_ALWAYS 0x1000 >> >>> >>> However that's an old memory and there is this tcp_fastopen_key file I >>> don't know anything about yet (this is such bleeding edge stuff!) >>> >>> ... and with tcp_fastopen disabled things should still work right... >>> so I'm thinking something else is busted in the stack. >>> >>> I've also observed a dns slowdown in what I've been testing but hadn't >>> dug into packet dumps. (and was assuming, until now, it was due to me >>> fiddling with ULAs inside the network) Thanks for digging this deep! >>> >>> I never said this first attempt at 3.7 for cero was going to be >>> perfect, but we've entered a new age of subtle problems here. >>> >>> I strongly suggest nobody else try this dev build as a default gw, and >>> that the TFO folk ignore the noise for now. >> >> >> SG. >> >> Jerry >> >>> >>> >>> I just got a 3.7.1 box built on x86_64 so as to a/b some captures. >>> Regrettably I'm short on time through the weekend... >>> >>> On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak >>> wrote: >>> > I am seeing something strange here, with polipo related to TFO but al= so >>> > DNS. >>> > When I just took 3.7.1-1 and set my windows 7 laptop to use >>> > gw.home.lan:8123 >>> > as http proxy it didn't work. What I observed was: >>> > A) after quite a while polipo's response to browser was 504 Host >>> > www.osnews.com lookup failed: Timeout >>> > b) this error in ssh console: Host osnews.com lookup failed: Timeout >>> > (131072) >>> > c) Disabling TFO by adding option useTCPFastOpen 'false' to config >>> > 'polipo' >>> > 'general' works around the problem >>> > d) Alternatively, you can keep TFO enabled in polipo but change optio= n >>> > 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!) >>> > This is very weird, because TFO is TCP and the DNS queries fired off = by >>> > polipo are UDP: >>> > root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo >>> > 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF], >>> > proto >>> > UDP (17), length 60) >>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!] >>> > 55396+ A? >>> > www.osnews.com. (32) >>> > 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x..... >>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d.. >>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn >>> > 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com..... >>> > 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF], >>> > proto >>> > UDP (17), length 60) >>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!] >>> > 55396+ >>> > AAAA? www.osnews.com. (32) >>> > 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x..... >>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d.. >>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn >>> > 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com..... >>> > 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], prot= o >>> > UDP >>> > (17), length 123) >>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!] 553= 96 >>> > q: >>> > A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns: >>> > osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS >>> > ns1.swelter.net. (95) >>> > 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@.@.>> > 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d.. >>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn >>> > 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com......... >>> > 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV...... >>> > 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe >>> > 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........ >>> > 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@ >>> > 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], prot= o >>> > UDP >>> > (17), length 135) >>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!] 553= 96 >>> > q: >>> > AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA >>> > 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net., >>> > osnews.com. [29m3s] NS ns2.swelter.net. (107) >>> > 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@.@.>> > 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d.. >>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn >>> > 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com......... >>> > 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b >>> > 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................ >>> > 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter >>> > 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............ >>> > 0x0080: 0603 6e73 32c0 4c ..ns2.L >>> > This is the only DNS traffic I saw during the attempts. The tcpdumps >>> > have >>> > udp bad checksum but when I disabled TFO in polipo, the UDP where sti= ll >>> > bad >>> > checksum but they worked. >>> > Really weird. >>> > p.s. UPNP still works for port forwarding negotiation as it did in >>> > 3.6.11-4 >>> > I still couldn't get the UPNP/SSDP broadcasts (udp to 239.255.255.250= ) >>> > to >>> > being forwarded between se00 and sw00/sw10. Last time it worked was >>> > ~3.3.8. >>> > I'm starting not to question why it doesn't work, I'm starting to >>> > wonder why >>> > it did work then ;-) >>> > Regards, >>> > Maciej >>> > On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht wrote= : >>> >> >>> >> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet >>> >> wrote: >>> >> > Sorry, could you give us a copy of the panic stack trace ? >>> >> >>> >> I will get a serial console up on a wndr3800 by sunday. (sorry, just >>> >> landed in california, am in disarray) >>> >> >>> >> The latest dev build of cero for the wndr3800 and wndr3700v2 is at: >>> >> >>> >> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/ >>> >> >>> >> -- >>> >> Dave T=E4ht >>> >> >>> >> Fixing bufferbloat with cerowrt: >>> >> http://www.teklibre.com/cerowrt/subscribe.html >>> >> _______________________________________________ >>> >> Cerowrt-devel mailing list >>> >> Cerowrt-devel@lists.bufferbloat.net >>> >> https://lists.bufferbloat.net/listinfo/cerowrt-devel >>> > >>> > >>> >>> >>> >>> -- >>> Dave T=E4ht >>> >>> Fixing bufferbloat with cerowrt: >>> http://www.teklibre.com/cerowrt/subscribe.html >> >> >