<p>Without TFO all worked fine.<br>
The problem is when tfo server is on cero box. <br>
I will try both ECN on on laptop and disabling ECN on cero with TFO on. Will report the behavior seen.</p>
<p>Thanks,<br>
Ketan.</p>
<div class="gmail_quote">On Jan 5, 2013 7:50 AM, "Yuchung Cheng" <<a href="mailto:ycheng@google.com">ycheng@google.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On Fri, Jan 4, 2013 at 5:59 PM, Ketan Kulkarni <<a href="mailto:ketkulka@gmail.com">ketkulka@gmail.com</a>> wrote:<br>
> Well, I was trying polipo server on cero box and httping from laptop. On<br>
> both the boxes I set 3 in tcp_fastopen.<br>
><br>
> The panic is seen only when server is on cero box.<br>
> If I run server on my laptop and httping from cero all TFO connections are<br>
> successful.<br>
> So I doubt its the only problem is SYN+DATA.<br>
Just to confirm: you meant the problem is SYN/data processing on the<br>
server side?<br>
<br>
Maybe we hit some ECN / TFO bug. Some crash log would be great. Thanks<br>
for trying TFO!<br>
<br>
><br>
> Unfortunately I don't have the serial cable right now, and logread or dmesg<br>
> didn't print any logs before the cero router restarted.<br>
><br>
> Attached is the tcpdump capture on lo when client and server both run on<br>
> cero box.<br>
> HTH!<br>
><br>
> If you (or anyone) can suggest more diagnostics, I will be glad to provide.<br>
><br>
> On Jan 5, 2013 2:49 AM, "Jerry Chu" <<a href="mailto:hkchu@google.com">hkchu@google.com</a>> wrote:<br>
>><br>
>> +ycheng<br>
>><br>
>><br>
>> On Fri, Jan 4, 2013 at 1:11 PM, Dave Taht <<a href="mailto:dave.taht@gmail.com">dave.taht@gmail.com</a>> wrote:<br>
>>><br>
>>> Hmm. I would lean towards there being an issue with the new (freshly<br>
>>> ported forward to 3.7.1) unaligned checksum code for mips based on<br>
>>> what you say here. Or an offload...<br>
>>><br>
>>> As for the 239.x multicast issue, hmm... separate issue entirely.<br>
>>> Probably...<br>
>>><br>
>>> And then there's TFO. I note that in order to use it properly you need<br>
>>> to turn it on in proc. Last I remember that was<br>
>>><br>
>>> echo 3 > /proc/sys/net/ipv4/tcp_fastopen<br>
>><br>
>><br>
>> Correct - to enable the normal use of TFO for both client and server.<br>
>> There are other flags for advanced usage:<br>
>> /* Bit Flags for sysctl_tcp_fastopen */<br>
>> #define TFO_CLIENT_ENABLE 1<br>
>> #define TFO_SERVER_ENABLE 2<br>
>> #define TFO_CLIENT_NO_COOKIE 4 /* Send data-in-SYN w/o cookie */<br>
>><br>
>> /* Process SYN data but skip cookie validation */<br>
>> #define TFO_SERVER_COOKIE_NOT_CHKED 0x100<br>
>> /* Accept SYN data w/o any cookie option */<br>
>> #define TFO_SERVER_COOKIE_NOT_REQD 0x200<br>
>><br>
>> /* Force enable TFO on all listeners, i.e., not requiring the<br>
>> * TCP_FASTOPEN socket option. SOCKOPT1/2 determine how to set max_qlen.<br>
>> */<br>
>> #define TFO_SERVER_WO_SOCKOPT1 0x400<br>
>> #define TFO_SERVER_WO_SOCKOPT2 0x800<br>
>> /* Always create TFO child sockets on a TFO listener even when<br>
>> * cookie/data not present. (For testing purpose!)<br>
>> */<br>
>> #define TFO_SERVER_ALWAYS 0x1000<br>
>><br>
>>><br>
>>> However that's an old memory and there is this tcp_fastopen_key file I<br>
>>> don't know anything about yet (this is such bleeding edge stuff!)<br>
>>><br>
>>> ... and with tcp_fastopen disabled things should still work right...<br>
>>> so I'm thinking something else is busted in the stack.<br>
>>><br>
>>> I've also observed a dns slowdown in what I've been testing but hadn't<br>
>>> dug into packet dumps. (and was assuming, until now, it was due to me<br>
>>> fiddling with ULAs inside the network) Thanks for digging this deep!<br>
>>><br>
>>> I never said this first attempt at 3.7 for cero was going to be<br>
>>> perfect, but we've entered a new age of subtle problems here.<br>
>>><br>
>>> I strongly suggest nobody else try this dev build as a default gw, and<br>
>>> that the TFO folk ignore the noise for now.<br>
>><br>
>><br>
>> SG.<br>
>><br>
>> Jerry<br>
>><br>
>>><br>
>>><br>
>>> I just got a 3.7.1 box built on x86_64 so as to a/b some captures.<br>
>>> Regrettably I'm short on time through the weekend...<br>
>>><br>
>>> On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak <<a href="mailto:maciej@soltysiak.com">maciej@soltysiak.com</a>><br>
>>> wrote:<br>
>>> > I am seeing something strange here, with polipo related to TFO but also<br>
>>> > DNS.<br>
>>> > When I just took 3.7.1-1 and set my windows 7 laptop to use<br>
>>> > gw.home.lan:8123<br>
>>> > as http proxy it didn't work. What I observed was:<br>
>>> > A) after quite a while polipo's response to browser was 504 Host<br>
>>> > <a href="http://www.osnews.com" target="_blank">www.osnews.com</a> lookup failed: Timeout<br>
>>> > b) this error in ssh console: Host <a href="http://osnews.com" target="_blank">osnews.com</a> lookup failed: Timeout<br>
>>> > (131072)<br>
>>> > c) Disabling TFO by adding option useTCPFastOpen 'false' to config<br>
>>> > 'polipo'<br>
>>> > 'general' works around the problem<br>
>>> > d) Alternatively, you can keep TFO enabled in polipo but change option<br>
>>> > 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)<br>
>>> > This is very weird, because TFO is TCP and the DNS queries fired off by<br>
>>> > polipo are UDP:<br>
>>> > root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo<br>
>>> > 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF],<br>
>>> > proto<br>
>>> > UDP (17), length 60)<br>
>>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!]<br>
>>> > 55396+ A?<br>
>>> > <a href="http://www.osnews.com" target="_blank">www.osnews.com</a>. (32)<br>
>>> > 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x.....<br>
>>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..<br>
>>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn<br>
>>> > 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....<br>
>>> > 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF],<br>
>>> > proto<br>
>>> > UDP (17), length 60)<br>
>>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!]<br>
>>> > 55396+<br>
>>> > AAAA? <a href="http://www.osnews.com" target="_blank">www.osnews.com</a>. (32)<br>
>>> > 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x.....<br>
>>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..<br>
>>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn<br>
>>> > 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....<br>
>>> > 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto<br>
>>> > UDP<br>
>>> > (17), length 123)<br>
>>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!] 55396<br>
>>> > q:<br>
>>> > A? <a href="http://www.osnews.com" target="_blank">www.osnews.com</a>. 1/2/0 <a href="http://www.osnews.com" target="_blank">www.osnews.com</a>. [29m3s] A 74.86.31.159 ns:<br>
>>> > <a href="http://osnews.com" target="_blank">osnews.com</a>. [29m3s] NS <a href="http://ns2.swelter.net" target="_blank">ns2.swelter.net</a>., <a href="http://osnews.com" target="_blank">osnews.com</a>. [29m3s] NS<br>
>>> > <a href="http://ns1.swelter.net" target="_blank">ns1.swelter.net</a>. (95)<br>
>>> > 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@.@.<p....<br>
>>> > 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..<br>
>>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn<br>
>>> > 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........<br>
>>> > 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......<br>
>>> > 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe<br>
>>> > 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........<br>
>>> > 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@<br>
>>> > 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto<br>
>>> > UDP<br>
>>> > (17), length 135)<br>
>>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!] 55396<br>
>>> > q:<br>
>>> > AAAA? <a href="http://www.osnews.com" target="_blank">www.osnews.com</a>. 1/2/0 <a href="http://www.osnews.com" target="_blank">www.osnews.com</a>. [54m44s] AAAA<br>
>>> > 2607:f0d0:1002:62::3 ns: <a href="http://osnews.com" target="_blank">osnews.com</a>. [29m3s] NS <a href="http://ns1.swelter.net" target="_blank">ns1.swelter.net</a>.,<br>
>>> > <a href="http://osnews.com" target="_blank">osnews.com</a>. [29m3s] NS <a href="http://ns2.swelter.net" target="_blank">ns2.swelter.net</a>. (107)<br>
>>> > 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@.@.<d....<br>
>>> > 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..<br>
>>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn<br>
>>> > 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........<br>
>>> > 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b<br>
>>> > 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................<br>
>>> > 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter<br>
>>> > 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............<br>
>>> > 0x0080: 0603 6e73 32c0 4c ..ns2.L<br>
>>> > This is the only DNS traffic I saw during the attempts. The tcpdumps<br>
>>> > have<br>
>>> > udp bad checksum but when I disabled TFO in polipo, the UDP where still<br>
>>> > bad<br>
>>> > checksum but they worked.<br>
>>> > Really weird.<br>
>>> > p.s. UPNP still works for port forwarding negotiation as it did in<br>
>>> > 3.6.11-4<br>
>>> > I still couldn't get the UPNP/SSDP broadcasts (udp to 239.255.255.250)<br>
>>> > to<br>
>>> > being forwarded between se00 and sw00/sw10. Last time it worked was<br>
>>> > ~3.3.8.<br>
>>> > I'm starting not to question why it doesn't work, I'm starting to<br>
>>> > wonder why<br>
>>> > it did work then ;-)<br>
>>> > Regards,<br>
>>> > Maciej<br>
>>> > On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <<a href="mailto:dave.taht@gmail.com">dave.taht@gmail.com</a>> wrote:<br>
>>> >><br>
>>> >> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <<a href="mailto:edumazet@google.com">edumazet@google.com</a>><br>
>>> >> wrote:<br>
>>> >> > Sorry, could you give us a copy of the panic stack trace ?<br>
>>> >><br>
>>> >> I will get a serial console up on a wndr3800 by sunday. (sorry, just<br>
>>> >> landed in california, am in disarray)<br>
>>> >><br>
>>> >> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:<br>
>>> >><br>
>>> >> <a href="http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/" target="_blank">http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/</a><br>
>>> >><br>
>>> >> --<br>
>>> >> Dave Täht<br>
>>> >><br>
>>> >> Fixing bufferbloat with cerowrt:<br>
>>> >> <a href="http://www.teklibre.com/cerowrt/subscribe.html" target="_blank">http://www.teklibre.com/cerowrt/subscribe.html</a><br>
>>> >> _______________________________________________<br>
>>> >> Cerowrt-devel mailing list<br>
>>> >> <a href="mailto:Cerowrt-devel@lists.bufferbloat.net">Cerowrt-devel@lists.bufferbloat.net</a><br>
>>> >> <a href="https://lists.bufferbloat.net/listinfo/cerowrt-devel" target="_blank">https://lists.bufferbloat.net/listinfo/cerowrt-devel</a><br>
>>> ><br>
>>> ><br>
>>><br>
>>><br>
>>><br>
>>> --<br>
>>> Dave Täht<br>
>>><br>
>>> Fixing bufferbloat with cerowrt:<br>
>>> <a href="http://www.teklibre.com/cerowrt/subscribe.html" target="_blank">http://www.teklibre.com/cerowrt/subscribe.html</a><br>
>><br>
>><br>
><br>
</blockquote></div>