[Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

Yuchung Cheng ycheng at google.com
Fri Jan 4 21:20:29 EST 2013


On Fri, Jan 4, 2013 at 5:59 PM, Ketan Kulkarni <ketkulka at gmail.com> wrote:
> Well, I was trying polipo server on cero box and httping from laptop. On
> both the boxes I set 3 in tcp_fastopen.
>
> The panic is seen only when server is on cero box.
> If I run server on my laptop and httping from cero all TFO connections are
> successful.
> So I doubt its the only problem is SYN+DATA.
Just to confirm: you meant the problem is SYN/data processing on the
server side?

Maybe we hit some ECN / TFO bug. Some crash log would be great. Thanks
for trying TFO!

>
> Unfortunately I don't have the serial cable right now, and logread or dmesg
> didn't print any logs before the cero router  restarted.
>
> Attached is the tcpdump capture on lo when client and server both run on
> cero box.
> HTH!
>
> If you (or anyone) can suggest more diagnostics, I will be glad to provide.
>
> On Jan 5, 2013 2:49 AM, "Jerry Chu" <hkchu at google.com> wrote:
>>
>> +ycheng
>>
>>
>> On Fri, Jan 4, 2013 at 1:11 PM, Dave Taht <dave.taht at gmail.com> wrote:
>>>
>>> Hmm. I would lean towards there being an issue with the new (freshly
>>> ported forward to 3.7.1) unaligned checksum code for mips based on
>>> what you say here. Or an offload...
>>>
>>> As for the 239.x multicast issue, hmm... separate issue entirely.
>>> Probably...
>>>
>>> And then there's TFO. I note that in order to use it properly you need
>>> to turn it on in proc. Last I remember that was
>>>
>>> echo 3 > /proc/sys/net/ipv4/tcp_fastopen
>>
>>
>> Correct - to enable the normal use of TFO for both client and server.
>> There are other flags for advanced usage:
>>  /* Bit Flags for sysctl_tcp_fastopen */
>> #define TFO_CLIENT_ENABLE       1
>> #define TFO_SERVER_ENABLE       2
>> #define TFO_CLIENT_NO_COOKIE    4 /* Send data-in-SYN w/o cookie */
>>
>> /* Process SYN data but skip cookie validation */
>> #define TFO_SERVER_COOKIE_NOT_CHKED     0x100
>> /* Accept SYN data w/o any cookie option */
>> #define TFO_SERVER_COOKIE_NOT_REQD      0x200
>>
>> /* Force enable TFO on all listeners, i.e., not requiring the
>>  * TCP_FASTOPEN socket option. SOCKOPT1/2 determine how to set max_qlen.
>>  */
>> #define TFO_SERVER_WO_SOCKOPT1  0x400
>> #define TFO_SERVER_WO_SOCKOPT2  0x800
>> /* Always create TFO child sockets on a TFO listener even when
>>  * cookie/data not present. (For testing purpose!)
>>  */
>> #define TFO_SERVER_ALWAYS       0x1000
>>
>>>
>>> However that's an old memory and there is this tcp_fastopen_key file I
>>> don't know anything about yet (this is such bleeding edge stuff!)
>>>
>>> ... and with tcp_fastopen disabled things should still work right...
>>> so I'm thinking something else is busted in the stack.
>>>
>>> I've also observed a dns slowdown in what I've been testing but hadn't
>>> dug into packet dumps. (and was assuming, until now, it was due to me
>>> fiddling with ULAs inside the network) Thanks for digging this deep!
>>>
>>> I never said this first attempt at 3.7 for cero was going to be
>>> perfect, but we've entered a new age of subtle problems here.
>>>
>>> I strongly suggest nobody else try this dev build as a default gw, and
>>> that the TFO folk ignore the noise for now.
>>
>>
>> SG.
>>
>> Jerry
>>
>>>
>>>
>>> I just got a 3.7.1 box built on x86_64 so as to a/b some captures.
>>> Regrettably I'm short on time through the weekend...
>>>
>>> On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak <maciej at soltysiak.com>
>>> wrote:
>>> > I am seeing something strange here, with polipo related to TFO but also
>>> > DNS.
>>> > When I just took 3.7.1-1 and set my windows 7 laptop to use
>>> > gw.home.lan:8123
>>> > as http proxy it didn't work. What I observed was:
>>> > A) after quite a while polipo's response to browser was 504 Host
>>> > www.osnews.com lookup failed: Timeout
>>> > b) this error in ssh console: Host osnews.com lookup failed: Timeout
>>> > (131072)
>>> > c) Disabling TFO by adding option useTCPFastOpen 'false' to config
>>> > 'polipo'
>>> > 'general' works around the problem
>>> > d) Alternatively, you can keep TFO enabled in polipo but change option
>>> > 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
>>> > This is very weird, because TFO is TCP and the DNS queries fired off by
>>> > polipo are UDP:
>>> > root at OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
>>> > 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF],
>>> > proto
>>> > UDP (17), length 60)
>>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!]
>>> > 55396+ A?
>>> > www.osnews.com. (32)
>>> > 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<.. at .@.x.....
>>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>>> > 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
>>> > 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF],
>>> > proto
>>> > UDP (17), length 60)
>>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!]
>>> > 55396+
>>> > AAAA? www.osnews.com. (32)
>>> > 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<.. at .@.x.....
>>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>>> > 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
>>> > 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
>>> > UDP
>>> > (17), length 123)
>>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!] 55396
>>> > q:
>>> > A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
>>> > osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
>>> > ns1.swelter.net. (95)
>>> > 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{.. at .@.<p....
>>> > 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
>>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>>> > 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
>>> > 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
>>> > 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
>>> > 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
>>> > 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
>>> > 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
>>> > UDP
>>> > (17), length 135)
>>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!] 55396
>>> > q:
>>> > AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
>>> > 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
>>> > osnews.com. [29m3s] NS ns2.swelter.net. (107)
>>> > 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E..... at .@.<d....
>>> > 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
>>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>>> > 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
>>> > 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
>>> > 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
>>> > 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
>>> > 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
>>> > 0x0080: 0603 6e73 32c0 4c ..ns2.L
>>> > This is the only DNS traffic I saw during the attempts. The tcpdumps
>>> > have
>>> > udp bad checksum but when I disabled TFO in polipo, the UDP where still
>>> > bad
>>> > checksum but they worked.
>>> > Really weird.
>>> > p.s. UPNP still works for port forwarding negotiation as it did in
>>> > 3.6.11-4
>>> > I still couldn't get the UPNP/SSDP broadcasts (udp to 239.255.255.250)
>>> > to
>>> > being forwarded between se00 and sw00/sw10. Last time it worked was
>>> > ~3.3.8.
>>> > I'm starting not to question why it doesn't work, I'm starting to
>>> > wonder why
>>> > it did work then ;-)
>>> > Regards,
>>> > Maciej
>>> > On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht at gmail.com> wrote:
>>> >>
>>> >> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet at google.com>
>>> >> wrote:
>>> >> > Sorry, could you give us a copy of the panic stack trace ?
>>> >>
>>> >> I will get a serial console up on a wndr3800 by sunday. (sorry, just
>>> >> landed in california, am in disarray)
>>> >>
>>> >> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
>>> >>
>>> >> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>>> >>
>>> >> --
>>> >> Dave Täht
>>> >>
>>> >> Fixing bufferbloat with cerowrt:
>>> >> http://www.teklibre.com/cerowrt/subscribe.html
>>> >> _______________________________________________
>>> >> Cerowrt-devel mailing list
>>> >> Cerowrt-devel at lists.bufferbloat.net
>>> >> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Dave Täht
>>>
>>> Fixing bufferbloat with cerowrt:
>>> http://www.teklibre.com/cerowrt/subscribe.html
>>
>>
>


More information about the Cerowrt-devel mailing list