[Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

Ketan Kulkarni ketkulka at gmail.com
Fri Jan 4 20:59:53 EST 2013


Well, I was trying polipo server on cero box and httping from laptop. On
both the boxes I set 3 in tcp_fastopen.

The panic is seen only when server is on cero box.
If I run server on my laptop and httping from cero all TFO connections are
successful.
So I doubt its the only problem is SYN+DATA.

Unfortunately I don't have the serial cable right now, and logread or dmesg
didn't print any logs before the cero router  restarted.

Attached is the tcpdump capture on lo when client and server both run on
cero box.
HTH!

If you (or anyone) can suggest more diagnostics, I will be glad to provide.
 On Jan 5, 2013 2:49 AM, "Jerry Chu" <hkchu at google.com> wrote:

> +ycheng
>
>
> On Fri, Jan 4, 2013 at 1:11 PM, Dave Taht <dave.taht at gmail.com> wrote:
>
>> Hmm. I would lean towards there being an issue with the new (freshly
>> ported forward to 3.7.1) unaligned checksum code for mips based on
>> what you say here. Or an offload...
>>
>> As for the 239.x multicast issue, hmm... separate issue entirely.
>> Probably...
>>
>> And then there's TFO. I note that in order to use it properly you need
>> to turn it on in proc. Last I remember that was
>>
>> echo 3 > /proc/sys/net/ipv4/tcp_fastopen
>>
>
> Correct - to enable the normal use of TFO for both client and server.
> There are other flags for advanced usage:
>  /* Bit Flags for sysctl_tcp_fastopen */
> #define TFO_CLIENT_ENABLE       1
> #define TFO_SERVER_ENABLE       2
> #define TFO_CLIENT_NO_COOKIE    4 /* Send data-in-SYN w/o cookie */
>
> /* Process SYN data but skip cookie validation */
> #define TFO_SERVER_COOKIE_NOT_CHKED     0x100
> /* Accept SYN data w/o any cookie option */
> #define TFO_SERVER_COOKIE_NOT_REQD      0x200
>
> /* Force enable TFO on all listeners, i.e., not requiring the
>  * TCP_FASTOPEN socket option. SOCKOPT1/2 determine how to set max_qlen.
>  */
> #define TFO_SERVER_WO_SOCKOPT1  0x400
> #define TFO_SERVER_WO_SOCKOPT2  0x800
> /* Always create TFO child sockets on a TFO listener even when
>  * cookie/data not present. (For testing purpose!)
>  */
> #define TFO_SERVER_ALWAYS       0x1000
>
>
>> However that's an old memory and there is this tcp_fastopen_key file I
>> don't know anything about yet (this is such bleeding edge stuff!)
>>
>> ... and with tcp_fastopen disabled things should still work right...
>> so I'm thinking something else is busted in the stack.
>>
>> I've also observed a dns slowdown in what I've been testing but hadn't
>> dug into packet dumps. (and was assuming, until now, it was due to me
>> fiddling with ULAs inside the network) Thanks for digging this deep!
>>
>> I never said this first attempt at 3.7 for cero was going to be
>> perfect, but we've entered a new age of subtle problems here.
>>
>> I strongly suggest nobody else try this dev build as a default gw, and
>> that the TFO folk ignore the noise for now.
>>
>
> SG.
>
> Jerry
>
>
>>
>> I just got a 3.7.1 box built on x86_64 so as to a/b some captures.
>> Regrettably I'm short on time through the weekend...
>>
>> On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak <maciej at soltysiak.com>
>> wrote:
>> > I am seeing something strange here, with polipo related to TFO but also
>> DNS.
>> > When I just took 3.7.1-1 and set my windows 7 laptop to use
>> gw.home.lan:8123
>> > as http proxy it didn't work. What I observed was:
>> > A) after quite a while polipo's response to browser was 504 Host
>> > www.osnews.com lookup failed: Timeout
>> > b) this error in ssh console: Host osnews.com lookup failed: Timeout
>> > (131072)
>> > c) Disabling TFO by adding option useTCPFastOpen 'false' to config
>> 'polipo'
>> > 'general' works around the problem
>> > d) Alternatively, you can keep TFO enabled in polipo but change option
>> > 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
>> > This is very weird, because TFO is TCP and the DNS queries fired off by
>> > polipo are UDP:
>> > root at OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
>> > 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF],
>> proto
>> > UDP (17), length 60)
>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!]
>> 55396+ A?
>> > www.osnews.com. (32)
>> > 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<.. at .@.x.....
>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>> > 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
>> > 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF],
>> proto
>> > UDP (17), length 60)
>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!] 55396+
>> > AAAA? www.osnews.com. (32)
>> > 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<.. at .@.x.....
>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>> > 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
>> > 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
>> UDP
>> > (17), length 123)
>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!] 55396
>> q:
>> > A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
>> > osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
>> > ns1.swelter.net. (95)
>> > 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{.. at .@.<p....
>> > 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>> > 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
>> > 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
>> > 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
>> > 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
>> > 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
>> > 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
>> UDP
>> > (17), length 135)
>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!] 55396
>> q:
>> > AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
>> > 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
>> > osnews.com. [29m3s] NS ns2.swelter.net. (107)
>> > 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E..... at .@.<d....
>> > 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>> > 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
>> > 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
>> > 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
>> > 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
>> > 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
>> > 0x0080: 0603 6e73 32c0 4c ..ns2.L
>> > This is the only DNS traffic I saw during the attempts. The tcpdumps
>> have
>> > udp bad checksum but when I disabled TFO in polipo, the UDP where still
>> bad
>> > checksum but they worked.
>> > Really weird.
>> > p.s. UPNP still works for port forwarding negotiation as it did in
>> 3.6.11-4
>> > I still couldn't get the UPNP/SSDP broadcasts (udp to 239.255.255.250)
>> to
>> > being forwarded between se00 and sw00/sw10. Last time it worked was
>> ~3.3.8.
>> > I'm starting not to question why it doesn't work, I'm starting to
>> wonder why
>> > it did work then ;-)
>> > Regards,
>> > Maciej
>> > On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht at gmail.com> wrote:
>> >>
>> >> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet at google.com>
>> wrote:
>> >> > Sorry, could you give us a copy of the panic stack trace ?
>> >>
>> >> I will get a serial console up on a wndr3800 by sunday. (sorry, just
>> >> landed in california, am in disarray)
>> >>
>> >> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
>> >>
>> >> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>> >>
>> >> --
>> >> Dave Täht
>> >>
>> >> Fixing bufferbloat with cerowrt:
>> >> http://www.teklibre.com/cerowrt/subscribe.html
>> >> _______________________________________________
>> >> Cerowrt-devel mailing list
>> >> Cerowrt-devel at lists.bufferbloat.net
>> >> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>> >
>> >
>>
>>
>>
>> --
>> Dave Täht
>>
>> Fixing bufferbloat with cerowrt:
>> http://www.teklibre.com/cerowrt/subscribe.html
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20130105/d3c6772b/attachment-0002.html>
-------------- next part --------------
root at OpenWrt:~# httping -F -g http://127.0.0.1:8123
PING 127.0.0.1:8123 (http://127.0.0.1:8123):
16:36:36.033466 IP localhost.39443 > localhost.8123: Flags [SEW], seq 2946288341, win 43690, options [mss 65495,sackOK,TS val 4294964893 ecr 0,nop,wscale 6,Unknown Option 254f989], length 0 --> SYN + Coockie and no Data.
connected to 127.0.0.1:8123 (183 bytes), seq=0 time=3.15 ms 
16:36:36.033584 IP localhost.8123 > localhost.39443: Flags [S.E], seq 654941876, ack 2946288342, win 43690, options [mss 65495,sackOK,TS val 4294964893 ecr 4294964893,nop,wscale 6,Unknown Option 254f989df087214939732ef], length 0  --> SYN+ACK+Cookie
16:36:36.033638 IP localhost.39443 > localhost.8123: Flags [.], ack 1, win 683, options [nop,nop,TS val 4294964893 ecr 4294964893], length 0
16:36:36.034971 IP localhost.39443 > localhost.8123: Flags [P.], seq 1:65, ack 1, win 683, options [nop,nop,TS val 4294964894 ecr 4294964893], length 64 --> HTTP Request HEAD
16:36:36.035112 IP localhost.8123 > localhost.39443: Flags [.], ack 65, win 683, options [nop,nop,TS val 4294964894 ecr 4294964894], length 0
16:36:36.035808 IP localhost.8123 > localhost.39443: Flags [P.], seq 1:184, ack 65, win 683, options [nop,nop,TS val 4294964894 ecr 4294964894], length 183  --> HTTP Response
16:36:36.035965 IP localhost.8123 > localhost.39443: Flags [F.], seq 184, ack 65, win 683, options [nop,nop,TS val 4294964894 ecr 4294964894], length 0
16:36:36.036082 IP localhost.39443 > localhost.8123: Flags [.], ack 184, win 700, options [nop,nop,TS val 4294964894 ecr 4294964894], length 0
16:36:36.036350 IP localhost.39443 > localhost.8123: Flags [F.], seq 65, ack 185, win 700, options [nop,nop,TS val 4294964894 ecr 4294964894], length 0
16:36:36.036462 IP localhost.8123 > localhost.39443: Flags [.], ack 66, win 683, options [nop,nop,TS val 4294964894 ecr 4294964894], length 0  --> Connection Completes

### No packets seen later - Probably SYN+Data Crashed the box ###


More information about the Cerowrt-devel mailing list