[Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

Eric Dumazet edumazet at google.com
Fri Jan 4 19:16:56 PST 2013


/* TCP Fast Open Cookie as stored in memory */
struct tcp_fastopen_cookie {
        s8      len;
        u8      val[TCP_FASTOPEN_COOKIE_MAX];
};

I wonder if 's8' really does what we want on all arches.

We want to store a negative 8bit number, not an unsigned one...



On Fri, Jan 4, 2013 at 7:02 PM, Ketan Kulkarni <ketkulka at gmail.com> wrote:

> Without TFO all worked fine.
> The problem is when tfo server is on cero box.
> I will try both ECN on on laptop and disabling ECN on cero with TFO on.
> Will report the behavior seen.
>
> Thanks,
> Ketan.
> On Jan 5, 2013 7:50 AM, "Yuchung Cheng" <ycheng at google.com> wrote:
>
>> On Fri, Jan 4, 2013 at 5:59 PM, Ketan Kulkarni <ketkulka at gmail.com>
>> wrote:
>> > Well, I was trying polipo server on cero box and httping from laptop. On
>> > both the boxes I set 3 in tcp_fastopen.
>> >
>> > The panic is seen only when server is on cero box.
>> > If I run server on my laptop and httping from cero all TFO connections
>> are
>> > successful.
>> > So I doubt its the only problem is SYN+DATA.
>> Just to confirm: you meant the problem is SYN/data processing on the
>> server side?
>>
>> Maybe we hit some ECN / TFO bug. Some crash log would be great. Thanks
>> for trying TFO!
>>
>> >
>> > Unfortunately I don't have the serial cable right now, and logread or
>> dmesg
>> > didn't print any logs before the cero router  restarted.
>> >
>> > Attached is the tcpdump capture on lo when client and server both run on
>> > cero box.
>> > HTH!
>> >
>> > If you (or anyone) can suggest more diagnostics, I will be glad to
>> provide.
>> >
>> > On Jan 5, 2013 2:49 AM, "Jerry Chu" <hkchu at google.com> wrote:
>> >>
>> >> +ycheng
>> >>
>> >>
>> >> On Fri, Jan 4, 2013 at 1:11 PM, Dave Taht <dave.taht at gmail.com> wrote:
>> >>>
>> >>> Hmm. I would lean towards there being an issue with the new (freshly
>> >>> ported forward to 3.7.1) unaligned checksum code for mips based on
>> >>> what you say here. Or an offload...
>> >>>
>> >>> As for the 239.x multicast issue, hmm... separate issue entirely.
>> >>> Probably...
>> >>>
>> >>> And then there's TFO. I note that in order to use it properly you need
>> >>> to turn it on in proc. Last I remember that was
>> >>>
>> >>> echo 3 > /proc/sys/net/ipv4/tcp_fastopen
>> >>
>> >>
>> >> Correct - to enable the normal use of TFO for both client and server.
>> >> There are other flags for advanced usage:
>> >>  /* Bit Flags for sysctl_tcp_fastopen */
>> >> #define TFO_CLIENT_ENABLE       1
>> >> #define TFO_SERVER_ENABLE       2
>> >> #define TFO_CLIENT_NO_COOKIE    4 /* Send data-in-SYN w/o cookie */
>> >>
>> >> /* Process SYN data but skip cookie validation */
>> >> #define TFO_SERVER_COOKIE_NOT_CHKED     0x100
>> >> /* Accept SYN data w/o any cookie option */
>> >> #define TFO_SERVER_COOKIE_NOT_REQD      0x200
>> >>
>> >> /* Force enable TFO on all listeners, i.e., not requiring the
>> >>  * TCP_FASTOPEN socket option. SOCKOPT1/2 determine how to set
>> max_qlen.
>> >>  */
>> >> #define TFO_SERVER_WO_SOCKOPT1  0x400
>> >> #define TFO_SERVER_WO_SOCKOPT2  0x800
>> >> /* Always create TFO child sockets on a TFO listener even when
>> >>  * cookie/data not present. (For testing purpose!)
>> >>  */
>> >> #define TFO_SERVER_ALWAYS       0x1000
>> >>
>> >>>
>> >>> However that's an old memory and there is this tcp_fastopen_key file I
>> >>> don't know anything about yet (this is such bleeding edge stuff!)
>> >>>
>> >>> ... and with tcp_fastopen disabled things should still work right...
>> >>> so I'm thinking something else is busted in the stack.
>> >>>
>> >>> I've also observed a dns slowdown in what I've been testing but hadn't
>> >>> dug into packet dumps. (and was assuming, until now, it was due to me
>> >>> fiddling with ULAs inside the network) Thanks for digging this deep!
>> >>>
>> >>> I never said this first attempt at 3.7 for cero was going to be
>> >>> perfect, but we've entered a new age of subtle problems here.
>> >>>
>> >>> I strongly suggest nobody else try this dev build as a default gw, and
>> >>> that the TFO folk ignore the noise for now.
>> >>
>> >>
>> >> SG.
>> >>
>> >> Jerry
>> >>
>> >>>
>> >>>
>> >>> I just got a 3.7.1 box built on x86_64 so as to a/b some captures.
>> >>> Regrettably I'm short on time through the weekend...
>> >>>
>> >>> On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak <
>> maciej at soltysiak.com>
>> >>> wrote:
>> >>> > I am seeing something strange here, with polipo related to TFO but
>> also
>> >>> > DNS.
>> >>> > When I just took 3.7.1-1 and set my windows 7 laptop to use
>> >>> > gw.home.lan:8123
>> >>> > as http proxy it didn't work. What I observed was:
>> >>> > A) after quite a while polipo's response to browser was 504 Host
>> >>> > www.osnews.com lookup failed: Timeout
>> >>> > b) this error in ssh console: Host osnews.com lookup failed:
>> Timeout
>> >>> > (131072)
>> >>> > c) Disabling TFO by adding option useTCPFastOpen 'false' to config
>> >>> > 'polipo'
>> >>> > 'general' works around the problem
>> >>> > d) Alternatively, you can keep TFO enabled in polipo but change
>> option
>> >>> > 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
>> >>> > This is very weird, because TFO is TCP and the DNS queries fired
>> off by
>> >>> > polipo are UDP:
>> >>> > root at OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
>> >>> > 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF],
>> >>> > proto
>> >>> > UDP (17), length 60)
>> >>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!]
>> >>> > 55396+ A?
>> >>> > www.osnews.com. (32)
>> >>> > 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<.. at .@.x.....
>> >>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>> >>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>> >>> > 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
>> >>> > 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF],
>> >>> > proto
>> >>> > UDP (17), length 60)
>> >>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!]
>> >>> > 55396+
>> >>> > AAAA? www.osnews.com. (32)
>> >>> > 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<.. at .@.x.....
>> >>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>> >>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>> >>> > 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
>> >>> > 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
>> proto
>> >>> > UDP
>> >>> > (17), length 123)
>> >>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!]
>> 55396
>> >>> > q:
>> >>> > A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
>> >>> > osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
>> >>> > ns1.swelter.net. (95)
>> >>> > 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{.. at .@.<p....
>> >>> > 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
>> >>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>> >>> > 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
>> >>> > 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
>> >>> > 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
>> >>> > 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
>> >>> > 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
>> >>> > 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
>> proto
>> >>> > UDP
>> >>> > (17), length 135)
>> >>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!]
>> 55396
>> >>> > q:
>> >>> > AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
>> >>> > 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
>> >>> > osnews.com. [29m3s] NS ns2.swelter.net. (107)
>> >>> > 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E..... at .@.<d....
>> >>> > 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
>> >>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>> >>> > 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
>> >>> > 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
>> >>> > 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
>> >>> > 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
>> >>> > 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
>> >>> > 0x0080: 0603 6e73 32c0 4c ..ns2.L
>> >>> > This is the only DNS traffic I saw during the attempts. The tcpdumps
>> >>> > have
>> >>> > udp bad checksum but when I disabled TFO in polipo, the UDP where
>> still
>> >>> > bad
>> >>> > checksum but they worked.
>> >>> > Really weird.
>> >>> > p.s. UPNP still works for port forwarding negotiation as it did in
>> >>> > 3.6.11-4
>> >>> > I still couldn't get the UPNP/SSDP broadcasts (udp to
>> 239.255.255.250)
>> >>> > to
>> >>> > being forwarded between se00 and sw00/sw10. Last time it worked was
>> >>> > ~3.3.8.
>> >>> > I'm starting not to question why it doesn't work, I'm starting to
>> >>> > wonder why
>> >>> > it did work then ;-)
>> >>> > Regards,
>> >>> > Maciej
>> >>> > On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht at gmail.com>
>> wrote:
>> >>> >>
>> >>> >> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet at google.com>
>> >>> >> wrote:
>> >>> >> > Sorry, could you give us a copy of the panic stack trace ?
>> >>> >>
>> >>> >> I will get a serial console up on a wndr3800 by sunday. (sorry,
>> just
>> >>> >> landed in california, am in disarray)
>> >>> >>
>> >>> >> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
>> >>> >>
>> >>> >> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>> >>> >>
>> >>> >> --
>> >>> >> Dave Täht
>> >>> >>
>> >>> >> Fixing bufferbloat with cerowrt:
>> >>> >> http://www.teklibre.com/cerowrt/subscribe.html
>> >>> >> _______________________________________________
>> >>> >> Cerowrt-devel mailing list
>> >>> >> Cerowrt-devel at lists.bufferbloat.net
>> >>> >> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Dave Täht
>> >>>
>> >>> Fixing bufferbloat with cerowrt:
>> >>> http://www.teklibre.com/cerowrt/subscribe.html
>> >>
>> >>
>> >
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20130104/891a266e/attachment.html>


More information about the Cerowrt-devel mailing list