Development issues regarding the cerowrt test router project
 help / color / mirror / Atom feed
* [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
@ 2013-01-04 17:04 Dave Taht
  2013-01-04 17:27 ` Eric Dumazet
  0 siblings, 1 reply; 37+ messages in thread
From: Dave Taht @ 2013-01-04 17:04 UTC (permalink / raw)
  To: Ketan Kulkarni, Eric Dumazet; +Cc: cerowrt-devel

On Thu, Jan 3, 2013 at 8:54 AM, Ketan Kulkarni <ketkulka@gmail.com> wrote:
> Thanks Dave.
> I upgraded my 3800 to 3.7.1-1. It is working for day to day Internet activity.
>
> However, I am not able to get through even a single TCP TFO
> connection. The router restarts as soon as it sees the TFO connection.
> Looks like SYN+Data is crashing the box (see attached trace captured
> on lo iface of cero). logread, dmesg did not show anything. I don't
> know whether its kernel panic.
>
> Any pointers to debug further?
>
> This is strange as 3.6 was working for SYN+Data cases.
>
> However the difference from previous instance is the polipo server
> with TFO running on cero box.
> Client may run on same cero or on my laptop which in either case
> crashes the box.
>
> On 3.7, if I run the TFO client on cero box and TFO server on the
> laptop, it still works but not the reverse.
>
> Any thoughts?

I suspect that the new TFO code hasn't been tested much on mips. Am
glad you are doing so!

I got my replacement wndr3800s a few days ago, and will put the next
build through the TFO wringer (as well as attach a serial console).
That said, perhaps your bug report best belongs on netdev where there
are a couple people working hard on TFO and at least one has a cerowrt
capable box...

Does TFO successfully pass through the router?

>
> Thanks,
> Ketan
>
> On Wed, Jan 2, 2013 at 3:33 PM, Dave Taht <dave.taht@gmail.com> wrote:
>> Two formerly back-ordered 3800s arrived yesterday!
>>
>> They barely had time to power on before I reflashed them with:
>>
>> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>>
>> I still regard this series as heavily development oriented and
>> unsuitable for general use. In particular I'd like to wait for dnsmasq
>> to come out of beta, fix AHCP, add a gui for the ceroshaper, etc, etc.
>>
>> But: This is the first devel release I've been able to test in the
>> real world as a default gw in a while. So far, so good.
>>
>> I have some issues with how the new network6 configuration stuff
>> interacts with ahcp, but aside from that... I saw upnp work for the
>> first time... saw the ula auto code work... analyzed some dropbox and
>> netflix traffic, ran a couple android boxes through it, fiddled with
>> nfq_codel...
>>
>> some notes:
>>
>> + resync with openwrt head
>> + update to Linux 3.7.1 with unaligned patches from robert bradley
>> + A QFQ+ update
>> + mildly improved nfq_codel
>> - Missing cups support (didn't compile
>> - no ipv6 npt yet
>>
>> Merry New Year!
>>
>> --
>> Dave Täht
>>
>> Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-devel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-04 17:04 [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1 Dave Taht
@ 2013-01-04 17:27 ` Eric Dumazet
  2013-01-04 17:33   ` Dave Taht
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Dumazet @ 2013-01-04 17:27 UTC (permalink / raw)
  To: Dave Taht, Jerry Chu; +Cc: cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 3317 bytes --]

Sorry, could you give us a copy of the panic stack trace ?

Thanks


On Fri, Jan 4, 2013 at 9:04 AM, Dave Taht <dave.taht@gmail.com> wrote:

> On Thu, Jan 3, 2013 at 8:54 AM, Ketan Kulkarni <ketkulka@gmail.com> wrote:
> > Thanks Dave.
> > I upgraded my 3800 to 3.7.1-1. It is working for day to day Internet
> activity.
> >
> > However, I am not able to get through even a single TCP TFO
> > connection. The router restarts as soon as it sees the TFO connection.
> > Looks like SYN+Data is crashing the box (see attached trace captured
> > on lo iface of cero). logread, dmesg did not show anything. I don't
> > know whether its kernel panic.
> >
> > Any pointers to debug further?
> >
> > This is strange as 3.6 was working for SYN+Data cases.
> >
> > However the difference from previous instance is the polipo server
> > with TFO running on cero box.
> > Client may run on same cero or on my laptop which in either case
> > crashes the box.
> >
> > On 3.7, if I run the TFO client on cero box and TFO server on the
> > laptop, it still works but not the reverse.
> >
> > Any thoughts?
>
> I suspect that the new TFO code hasn't been tested much on mips. Am
> glad you are doing so!
>
> I got my replacement wndr3800s a few days ago, and will put the next
> build through the TFO wringer (as well as attach a serial console).
> That said, perhaps your bug report best belongs on netdev where there
> are a couple people working hard on TFO and at least one has a cerowrt
> capable box...
>
> Does TFO successfully pass through the router?
>
> >
> > Thanks,
> > Ketan
> >
> > On Wed, Jan 2, 2013 at 3:33 PM, Dave Taht <dave.taht@gmail.com> wrote:
> >> Two formerly back-ordered 3800s arrived yesterday!
> >>
> >> They barely had time to power on before I reflashed them with:
> >>
> >> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
> >>
> >> I still regard this series as heavily development oriented and
> >> unsuitable for general use. In particular I'd like to wait for dnsmasq
> >> to come out of beta, fix AHCP, add a gui for the ceroshaper, etc, etc.
> >>
> >> But: This is the first devel release I've been able to test in the
> >> real world as a default gw in a while. So far, so good.
> >>
> >> I have some issues with how the new network6 configuration stuff
> >> interacts with ahcp, but aside from that... I saw upnp work for the
> >> first time... saw the ula auto code work... analyzed some dropbox and
> >> netflix traffic, ran a couple android boxes through it, fiddled with
> >> nfq_codel...
> >>
> >> some notes:
> >>
> >> + resync with openwrt head
> >> + update to Linux 3.7.1 with unaligned patches from robert bradley
> >> + A QFQ+ update
> >> + mildly improved nfq_codel
> >> - Missing cups support (didn't compile
> >> - no ipv6 npt yet
> >>
> >> Merry New Year!
> >>
> >> --
> >> Dave Täht
> >>
> >> Fixing bufferbloat with cerowrt:
> http://www.teklibre.com/cerowrt/subscribe.html
> >> _______________________________________________
> >> Cerowrt-devel mailing list
> >> Cerowrt-devel@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt:
> http://www.teklibre.com/cerowrt/subscribe.html
>

[-- Attachment #2: Type: text/html, Size: 4762 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-04 17:27 ` Eric Dumazet
@ 2013-01-04 17:33   ` Dave Taht
  2013-01-04 20:42     ` Maciej Soltysiak
  0 siblings, 1 reply; 37+ messages in thread
From: Dave Taht @ 2013-01-04 17:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jerry Chu, cerowrt-devel

On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com> wrote:
> Sorry, could you give us a copy of the panic stack trace ?

I will get a serial console up on a wndr3800 by sunday. (sorry, just
landed in california, am in disarray)

The latest dev build of cero for the wndr3800 and wndr3700v2 is at:

http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/

-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-04 17:33   ` Dave Taht
@ 2013-01-04 20:42     ` Maciej Soltysiak
  2013-01-04 20:43       ` Maciej Soltysiak
                         ` (3 more replies)
  0 siblings, 4 replies; 37+ messages in thread
From: Maciej Soltysiak @ 2013-01-04 20:42 UTC (permalink / raw)
  To: Dave Taht, Ketan Kulkarni; +Cc: Jerry Chu, Eric Dumazet, cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 4740 bytes --]

I am seeing something strange here, with polipo related to TFO but also DNS.
When I just took 3.7.1-1 and set my windows 7 laptop to use
gw.home.lan:8123 as http proxy it didn't work. What I observed was:
A) after quite a while polipo's response to browser was 504 Host
www.osnews.com lookup failed: Timeout
b) this error in ssh console: Host osnews.com lookup failed: Timeout
(131072)
c) Disabling TFO by adding option useTCPFastOpen 'false' to config 'polipo'
'general' works around the problem
d) Alternatively, you can keep TFO enabled in polipo but change option
'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
This is very weird, because TFO is TCP and the DNS queries fired off by
polipo are UDP:
root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF], proto
UDP (17), length 60)
127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!] 55396+ A?
www.osnews.com. (32)
0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x.....
0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF], proto
UDP (17), length 60)
127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!] 55396+
AAAA? www.osnews.com. (32)
0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x.....
0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
(17), length 123)
127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!] 55396 q:
A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
ns1.swelter.net. (95)
0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@.@.<p....
0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
(17), length 135)
127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!] 55396 q:
AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net., osnews.com.
[29m3s] NS ns2.swelter.net. (107)
0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@.@.<d....
0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
0x0080: 0603 6e73 32c0 4c ..ns2.L
This is the only DNS traffic I saw during the attempts. The tcpdumps have
udp bad checksum but when I disabled TFO in polipo, the UDP where still bad
checksum but they worked.
Really weird.
p.s. UPNP still works for port forwarding negotiation as it did in 3.6.11-4
I still couldn't get the UPNP/SSDP broadcasts (udp to 239.255.255.250) to
being forwarded between se00 and sw00/sw10. Last time it worked was ~3.3.8.
I'm starting not to question why it doesn't work, I'm starting to wonder
why it did work then ;-)
Regards,
Maciej
On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com> wrote:

> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com> wrote:
> > Sorry, could you give us a copy of the panic stack trace ?
>
> I will get a serial console up on a wndr3800 by sunday. (sorry, just
> landed in california, am in disarray)
>
> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
>
> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt:
> http://www.teklibre.com/cerowrt/subscribe.html
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>

[-- Attachment #2: Type: text/html, Size: 6952 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-04 20:42     ` Maciej Soltysiak
@ 2013-01-04 20:43       ` Maciej Soltysiak
  2013-01-04 20:57         ` Jerry Chu
  2013-01-04 21:01         ` dpreed
  2013-01-04 21:11       ` Dave Taht
                         ` (2 subsequent siblings)
  3 siblings, 2 replies; 37+ messages in thread
From: Maciej Soltysiak @ 2013-01-04 20:43 UTC (permalink / raw)
  To: Dave Taht, Ketan Kulkarni; +Cc: Jerry Chu, Eric Dumazet, cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 5048 bytes --]

Oops, apologies if email was formatted weirdly...

On Fri, Jan 4, 2013 at 9:42 PM, Maciej Soltysiak <maciej@soltysiak.com>wrote:

> I am seeing something strange here, with polipo related to TFO but also
> DNS.
> When I just took 3.7.1-1 and set my windows 7 laptop to use
> gw.home.lan:8123 as http proxy it didn't work. What I observed was:
> A) after quite a while polipo's response to browser was 504 Host
> www.osnews.com lookup failed: Timeout
> b) this error in ssh console: Host osnews.com lookup failed: Timeout
> (131072)
> c) Disabling TFO by adding option useTCPFastOpen 'false' to config
> 'polipo' 'general' works around the problem
> d) Alternatively, you can keep TFO enabled in polipo but change option
> 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
>  This is very weird, because TFO is TCP and the DNS queries fired off by
> polipo are UDP:
> root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
> 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF], proto
> UDP (17), length 60)
> 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!] 55396+
> A? www.osnews.com. (32)
> 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x.....
> 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
> 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
> 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
> 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF], proto
> UDP (17), length 60)
> 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!] 55396+
> AAAA? www.osnews.com. (32)
> 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x.....
> 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
> 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
> 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
> 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
> (17), length 123)
> 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!] 55396 q:
> A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
> osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
> ns1.swelter.net. (95)
> 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@.@.<p....
> 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
> 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
> 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
> 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
> 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
> 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
> 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
> 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
> (17), length 135)
> 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!] 55396 q:
> AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
> 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
> osnews.com. [29m3s] NS ns2.swelter.net. (107)
> 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@.@.<d....
> 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
> 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
> 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
> 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
> 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
> 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
> 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
> 0x0080: 0603 6e73 32c0 4c ..ns2.L
> This is the only DNS traffic I saw during the attempts. The tcpdumps have
> udp bad checksum but when I disabled TFO in polipo, the UDP where still bad
> checksum but they worked.
> Really weird.
> p.s. UPNP still works for port forwarding negotiation as it did in 3.6.11-4
> I still couldn't get the UPNP/SSDP broadcasts (udp to 239.255.255.250) to
> being forwarded between se00 and sw00/sw10. Last time it worked was ~3.3.8.
> I'm starting not to question why it doesn't work, I'm starting to wonder
> why it did work then ;-)
>  Regards,
> Maciej
> On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com> wrote:
>
>> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com> wrote:
>> > Sorry, could you give us a copy of the panic stack trace ?
>>
>> I will get a serial console up on a wndr3800 by sunday. (sorry, just
>> landed in california, am in disarray)
>>
>> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
>>
>> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>>
>> --
>> Dave Täht
>>
>> Fixing bufferbloat with cerowrt:
>> http://www.teklibre.com/cerowrt/subscribe.html
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-devel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>
>
>

[-- Attachment #2: Type: text/html, Size: 7823 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-04 20:43       ` Maciej Soltysiak
@ 2013-01-04 20:57         ` Jerry Chu
  2013-01-04 21:21           ` Dave Taht
  2013-01-04 21:01         ` dpreed
  1 sibling, 1 reply; 37+ messages in thread
From: Jerry Chu @ 2013-01-04 20:57 UTC (permalink / raw)
  To: Maciej Soltysiak; +Cc: Yuchung Cheng, Eric Dumazet, cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 5512 bytes --]

+ycheng

On Fri, Jan 4, 2013 at 12:43 PM, Maciej Soltysiak <maciej@soltysiak.com>wrote:

> Oops, apologies if email was formatted weirdly...


The problem you described below is separate from the MIPS router crash one,
right? BTW, we've only tested on x86_64 arch.

In addition to tcpdump, "netstat -s | grep -i fastopen" may be useful too.

Thanks,

Jerry (author of TFO server code)


>
> On Fri, Jan 4, 2013 at 9:42 PM, Maciej Soltysiak <maciej@soltysiak.com>wrote:
>
>> I am seeing something strange here, with polipo related to TFO but also
>> DNS.
>>  When I just took 3.7.1-1 and set my windows 7 laptop to use
>> gw.home.lan:8123 as http proxy it didn't work. What I observed was:
>> A) after quite a while polipo's response to browser was 504 Host
>> www.osnews.com lookup failed: Timeout
>> b) this error in ssh console: Host osnews.com lookup failed: Timeout
>> (131072)
>> c) Disabling TFO by adding option useTCPFastOpen 'false' to config
>> 'polipo' 'general' works around the problem
>> d) Alternatively, you can keep TFO enabled in polipo but change option
>> 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
>>  This is very weird, because TFO is TCP and the DNS queries fired off by
>> polipo are UDP:
>> root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
>> 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF],
>> proto UDP (17), length 60)
>> 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!] 55396+
>> A? www.osnews.com. (32)
>> 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x.....
>> 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>> 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>> 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
>> 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF],
>> proto UDP (17), length 60)
>> 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!] 55396+
>> AAAA? www.osnews.com. (32)
>> 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x.....
>> 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>> 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>> 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
>> 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
>> UDP (17), length 123)
>> 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!] 55396
>> q: A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
>> osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
>> ns1.swelter.net. (95)
>> 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@.@.<p....
>> 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
>> 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>> 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
>> 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
>> 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
>> 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
>> 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
>> 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
>> UDP (17), length 135)
>> 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!] 55396
>> q: AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
>> 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
>> osnews.com. [29m3s] NS ns2.swelter.net. (107)
>> 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@.@.<d....
>> 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
>> 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>> 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
>> 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
>> 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
>> 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
>> 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
>> 0x0080: 0603 6e73 32c0 4c ..ns2.L
>> This is the only DNS traffic I saw during the attempts. The tcpdumps have
>> udp bad checksum but when I disabled TFO in polipo, the UDP where still bad
>> checksum but they worked.
>> Really weird.
>> p.s. UPNP still works for port forwarding negotiation as it did in
>> 3.6.11-4
>> I still couldn't get the UPNP/SSDP broadcasts (udp to 239.255.255.250) to
>> being forwarded between se00 and sw00/sw10. Last time it worked was ~3.3.8.
>> I'm starting not to question why it doesn't work, I'm starting to wonder
>> why it did work then ;-)
>>  Regards,
>> Maciej
>> On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com> wrote:
>>
>>> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com>
>>> wrote:
>>> > Sorry, could you give us a copy of the panic stack trace ?
>>>
>>> I will get a serial console up on a wndr3800 by sunday. (sorry, just
>>> landed in california, am in disarray)
>>>
>>> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
>>>
>>> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>>>
>>> --
>>> Dave Täht
>>>
>>> Fixing bufferbloat with cerowrt:
>>> http://www.teklibre.com/cerowrt/subscribe.html
>>> _______________________________________________
>>> Cerowrt-devel mailing list
>>> Cerowrt-devel@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>
>>
>>
>

[-- Attachment #2: Type: text/html, Size: 8853 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-04 20:43       ` Maciej Soltysiak
  2013-01-04 20:57         ` Jerry Chu
@ 2013-01-04 21:01         ` dpreed
  2013-01-04 22:49           ` Robert Bradley
  1 sibling, 1 reply; 37+ messages in thread
From: dpreed @ 2013-01-04 21:01 UTC (permalink / raw)
  To: Maciej Soltysiak; +Cc: Jerry Chu, Eric Dumazet, cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 6423 bytes --]


Is this a TFO where the endpoint is on cerowrt, or just a SYN+DATA for a non cerowrt destination?
 
I was looking at the firewall rules, and they are pretty complicated.  Perhaps the SYN+DATA triggers a strange firewall behavior (a loop?)   SYN's are special to firewalls, as we know.
 
-----Original Message-----
From: "Maciej Soltysiak" <maciej@soltysiak.com>
Sent: Friday, January 4, 2013 3:43pm
To: "Dave Taht" <dave.taht@gmail.com>, "Ketan Kulkarni" <ketkulka@gmail.com>
Cc: "Jerry Chu" <hkchu@google.com>, "Eric Dumazet" <edumazet@google.com>, cerowrt-devel@lists.bufferbloat.net
Subject: Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1



Oops, apologies if email was formatted weirdly...


On Fri, Jan 4, 2013 at 9:42 PM, Maciej Soltysiak <[mailto:maciej@soltysiak.com] maciej@soltysiak.com> wrote:

I am seeing something strange here, with polipo related to TFO but also DNS.

When I just took 3.7.1-1 and set my windows 7 laptop to use gw.home.lan:8123 as http proxy it didn't work. What I observed was:
A) after quite a while polipo's response to browser was 504 Host [http://www.osnews.com] www.osnews.com lookup failed: Timeout
b) this error in ssh console: Host [http://osnews.com] osnews.com lookup failed: Timeout (131072)
c) Disabling TFO by adding option useTCPFastOpen 'false' to config 'polipo' 'general' works around the problem
d) Alternatively, you can keep TFO enabled in polipo but change option 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)

This is very weird, because TFO is TCP and the DNS queries fired off by polipo are UDP:

[mailto:root@OpenWrt:/tmp/log#] root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF], proto UDP (17), length 60)
 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!] 55396+ A? [http://www.osnews.com/] www.osnews.com. (32)
 0x0000:  4500 003c c3d1 4000 4011 78dd 7f00 0001  E..<[mailto:..@.@.x] ..@.@.x.....
 0x0010:  7f00 0001 b8c8 0035 0028 fe3b d864 0100  .......5.(.;.d..
 0x0020:  0001 0000 0000 0000 0377 7777 066f 736e  .........www.osn
 0x0030:  6577 7303 636f 6d00 0001 0001            ews.com.....
20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF], proto UDP (17), length 60)
 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!] 55396+ AAAA? [http://www.osnews.com/] www.osnews.com. (32)
 0x0000:  4500 003c c3d2 4000 4011 78dc 7f00 0001  E..<[mailto:..@.@.x] ..@.@.x.....
 0x0010:  7f00 0001 b8c8 0035 0028 fe3b d864 0100  .......5.(.;.d..
 0x0020:  0001 0000 0000 0000 0377 7777 066f 736e  .........www.osn
 0x0030:  6577 7303 636f 6d00 001c 0001            ews.com.....
20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 123)
 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!] 55396 q: A? [http://www.osnews.com/] www.osnews.com. 1/2/0 [http://www.osnews.com/] www.osnews.com. [29m3s] A 74.86.31.159 ns: [http://osnews.com] osnews.com. [29m3s] NS [http://ns2.swelter.net] ns2.swelter.net., [http://osnews.com] osnews.com. [29m3s] NS [http://ns1.swelter.net] ns1.swelter.net. (95)
 0x0000:  4500 007b 0000 4000 4011 3c70 7f00 0001  [mailto:E..%7B..@.@.%3Cp] E..{..@.@.<p....
 0x0010:  7f00 0001 0035 b8c8 0067 fe7a d864 8180  .....5...g.z.d..
 0x0020:  0001 0001 0002 0000 0377 7777 066f 736e  .........www.osn
 0x0030:  6577 7303 636f 6d00 0001 0001 c00c 0001  ews.com.........
 0x0040:  0001 0000 06cf 0004 4a56 1f9f c010 0002  ........JV......
 0x0050:  0001 0000 06cf 0011 036e 7332 0773 7765  .........ns2.swe
 0x0060:  6c74 6572 036e 6574 00c0 1000 0200 0100  lter.net........
 0x0070:  0006 cf00 0603 6e73 31c0 40              ......ns1.@
20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 135)
 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!] 55396 q: AAAA? [http://www.osnews.com/] www.osnews.com. 1/2/0 [http://www.osnews.com/] www.osnews.com. [54m44s] AAAA 2607:f0d0:1002:62::3 ns: [http://osnews.com] osnews.com. [29m3s] NS [http://ns1.swelter.net] ns1.swelter.net., [http://osnews.com] osnews.com. [29m3s] NS [http://ns2.swelter.net] ns2.swelter.net. (107)
 0x0000:  4500 0087 0000 4000 4011 3c64 7f00 0001  [mailto:E.....@.@.%3Cd] E.....@.@.<d....
 0x0010:  7f00 0001 0035 b8c8 0073 fe86 d864 8180  .....5...s...d..
 0x0020:  0001 0001 0002 0000 0377 7777 066f 736e  .........www.osn
 0x0030:  6577 7303 636f 6d00 001c 0001 c00c 001c  ews.com.........
 0x0040:  0001 0000 0cd4 0010 2607 f0d0 1002 0062  ........&......b
 0x0050:  0000 0000 0000 0003 c010 0002 0001 0000  ................
 0x0060:  06cf 0011 036e 7331 0773 7765 6c74 6572  .....ns1.swelter
 0x0070:  036e 6574 00c0 1000 0200 0100 0006 cf00  .net............
 0x0080:  0603 6e73 32c0 4c                        ..ns2.L
This is the only DNS traffic I saw during the attempts. The tcpdumps have udp bad checksum but when I disabled TFO in polipo, the UDP where still bad checksum but they worked.

Really weird.
p.s. UPNP still works for port forwarding negotiation as it did in 3.6.11-4
I still couldn't get the UPNP/SSDP broadcasts (udp to 239.255.255.250) to being forwarded between se00 and sw00/sw10. Last time it worked was ~3.3.8. I'm starting not to question why it doesn't work, I'm starting to wonder why it did work then ;-)

Regards,
Maciej



On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <[mailto:dave.taht@gmail.com] dave.taht@gmail.com> wrote:

On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <[mailto:edumazet@google.com] edumazet@google.com> wrote:
 > Sorry, could you give us a copy of the panic stack trace ?

I will get a serial console up on a wndr3800 by sunday. (sorry, just
landed in california, am in disarray)

The latest dev build of cero for the wndr3800 and wndr3700v2 is at:

[http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/] http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/



--
Dave Täht

Fixing bufferbloat with cerowrt: [http://www.teklibre.com/cerowrt/subscribe.html] http://www.teklibre.com/cerowrt/subscribe.html
 _______________________________________________
Cerowrt-devel mailing list
[mailto:Cerowrt-devel@lists.bufferbloat.net] Cerowrt-devel@lists.bufferbloat.net
[https://lists.bufferbloat.net/listinfo/cerowrt-devel] https://lists.bufferbloat.net/listinfo/cerowrt-devel

[-- Attachment #2: Type: text/html, Size: 8662 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-04 20:42     ` Maciej Soltysiak
  2013-01-04 20:43       ` Maciej Soltysiak
@ 2013-01-04 21:11       ` Dave Taht
  2013-01-04 21:19         ` Jerry Chu
  2013-01-04 22:25       ` Robert Bradley
  2013-01-14  6:11       ` Dave Taht
  3 siblings, 1 reply; 37+ messages in thread
From: Dave Taht @ 2013-01-04 21:11 UTC (permalink / raw)
  To: Maciej Soltysiak; +Cc: Jerry Chu, Eric Dumazet, cerowrt-devel

Hmm. I would lean towards there being an issue with the new (freshly
ported forward to 3.7.1) unaligned checksum code for mips based on
what you say here. Or an offload...

As for the 239.x multicast issue, hmm... separate issue entirely. Probably...

And then there's TFO. I note that in order to use it properly you need
to turn it on in proc. Last I remember that was

echo 3 > /proc/sys/net/ipv4/tcp_fastopen

However that's an old memory and there is this tcp_fastopen_key file I
don't know anything about yet (this is such bleeding edge stuff!)

... and with tcp_fastopen disabled things should still work right...
so I'm thinking something else is busted in the stack.

I've also observed a dns slowdown in what I've been testing but hadn't
dug into packet dumps. (and was assuming, until now, it was due to me
fiddling with ULAs inside the network) Thanks for digging this deep!

I never said this first attempt at 3.7 for cero was going to be
perfect, but we've entered a new age of subtle problems here.

I strongly suggest nobody else try this dev build as a default gw, and
that the TFO folk ignore the noise for now.

I just got a 3.7.1 box built on x86_64 so as to a/b some captures.
Regrettably I'm short on time through the weekend...

On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak <maciej@soltysiak.com> wrote:
> I am seeing something strange here, with polipo related to TFO but also DNS.
> When I just took 3.7.1-1 and set my windows 7 laptop to use gw.home.lan:8123
> as http proxy it didn't work. What I observed was:
> A) after quite a while polipo's response to browser was 504 Host
> www.osnews.com lookup failed: Timeout
> b) this error in ssh console: Host osnews.com lookup failed: Timeout
> (131072)
> c) Disabling TFO by adding option useTCPFastOpen 'false' to config 'polipo'
> 'general' works around the problem
> d) Alternatively, you can keep TFO enabled in polipo but change option
> 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
> This is very weird, because TFO is TCP and the DNS queries fired off by
> polipo are UDP:
> root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
> 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF], proto
> UDP (17), length 60)
> 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!] 55396+ A?
> www.osnews.com. (32)
> 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x.....
> 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
> 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
> 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
> 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF], proto
> UDP (17), length 60)
> 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!] 55396+
> AAAA? www.osnews.com. (32)
> 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x.....
> 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
> 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
> 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
> 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
> (17), length 123)
> 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!] 55396 q:
> A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
> osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
> ns1.swelter.net. (95)
> 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@.@.<p....
> 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
> 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
> 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
> 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
> 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
> 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
> 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
> 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
> (17), length 135)
> 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!] 55396 q:
> AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
> 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
> osnews.com. [29m3s] NS ns2.swelter.net. (107)
> 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@.@.<d....
> 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
> 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
> 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
> 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
> 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
> 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
> 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
> 0x0080: 0603 6e73 32c0 4c ..ns2.L
> This is the only DNS traffic I saw during the attempts. The tcpdumps have
> udp bad checksum but when I disabled TFO in polipo, the UDP where still bad
> checksum but they worked.
> Really weird.
> p.s. UPNP still works for port forwarding negotiation as it did in 3.6.11-4
> I still couldn't get the UPNP/SSDP broadcasts (udp to 239.255.255.250) to
> being forwarded between se00 and sw00/sw10. Last time it worked was ~3.3.8.
> I'm starting not to question why it doesn't work, I'm starting to wonder why
> it did work then ;-)
> Regards,
> Maciej
> On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com> wrote:
>>
>> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com> wrote:
>> > Sorry, could you give us a copy of the panic stack trace ?
>>
>> I will get a serial console up on a wndr3800 by sunday. (sorry, just
>> landed in california, am in disarray)
>>
>> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
>>
>> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>>
>> --
>> Dave Täht
>>
>> Fixing bufferbloat with cerowrt:
>> http://www.teklibre.com/cerowrt/subscribe.html
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-devel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
>



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-04 21:11       ` Dave Taht
@ 2013-01-04 21:19         ` Jerry Chu
  2013-01-05  1:59           ` Ketan Kulkarni
  0 siblings, 1 reply; 37+ messages in thread
From: Jerry Chu @ 2013-01-04 21:19 UTC (permalink / raw)
  To: Dave Taht; +Cc: Eric Dumazet, cerowrt-devel, Yuchung Cheng

[-- Attachment #1: Type: text/plain, Size: 7663 bytes --]

+ycheng


On Fri, Jan 4, 2013 at 1:11 PM, Dave Taht <dave.taht@gmail.com> wrote:

> Hmm. I would lean towards there being an issue with the new (freshly
> ported forward to 3.7.1) unaligned checksum code for mips based on
> what you say here. Or an offload...
>
> As for the 239.x multicast issue, hmm... separate issue entirely.
> Probably...
>
> And then there's TFO. I note that in order to use it properly you need
> to turn it on in proc. Last I remember that was
>
> echo 3 > /proc/sys/net/ipv4/tcp_fastopen
>

Correct - to enable the normal use of TFO for both client and server. There
are other flags for advanced usage:
 /* Bit Flags for sysctl_tcp_fastopen */
#define TFO_CLIENT_ENABLE       1
#define TFO_SERVER_ENABLE       2
#define TFO_CLIENT_NO_COOKIE    4 /* Send data-in-SYN w/o cookie */

/* Process SYN data but skip cookie validation */
#define TFO_SERVER_COOKIE_NOT_CHKED     0x100
/* Accept SYN data w/o any cookie option */
#define TFO_SERVER_COOKIE_NOT_REQD      0x200

/* Force enable TFO on all listeners, i.e., not requiring the
 * TCP_FASTOPEN socket option. SOCKOPT1/2 determine how to set max_qlen.
 */
#define TFO_SERVER_WO_SOCKOPT1  0x400
#define TFO_SERVER_WO_SOCKOPT2  0x800
/* Always create TFO child sockets on a TFO listener even when
 * cookie/data not present. (For testing purpose!)
 */
#define TFO_SERVER_ALWAYS       0x1000


> However that's an old memory and there is this tcp_fastopen_key file I
> don't know anything about yet (this is such bleeding edge stuff!)
>
> ... and with tcp_fastopen disabled things should still work right...
> so I'm thinking something else is busted in the stack.
>
> I've also observed a dns slowdown in what I've been testing but hadn't
> dug into packet dumps. (and was assuming, until now, it was due to me
> fiddling with ULAs inside the network) Thanks for digging this deep!
>
> I never said this first attempt at 3.7 for cero was going to be
> perfect, but we've entered a new age of subtle problems here.
>
> I strongly suggest nobody else try this dev build as a default gw, and
> that the TFO folk ignore the noise for now.
>

SG.

Jerry


>
> I just got a 3.7.1 box built on x86_64 so as to a/b some captures.
> Regrettably I'm short on time through the weekend...
>
> On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak <maciej@soltysiak.com>
> wrote:
> > I am seeing something strange here, with polipo related to TFO but also
> DNS.
> > When I just took 3.7.1-1 and set my windows 7 laptop to use
> gw.home.lan:8123
> > as http proxy it didn't work. What I observed was:
> > A) after quite a while polipo's response to browser was 504 Host
> > www.osnews.com lookup failed: Timeout
> > b) this error in ssh console: Host osnews.com lookup failed: Timeout
> > (131072)
> > c) Disabling TFO by adding option useTCPFastOpen 'false' to config
> 'polipo'
> > 'general' works around the problem
> > d) Alternatively, you can keep TFO enabled in polipo but change option
> > 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
> > This is very weird, because TFO is TCP and the DNS queries fired off by
> > polipo are UDP:
> > root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
> > 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF],
> proto
> > UDP (17), length 60)
> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!] 55396+
> A?
> > www.osnews.com. (32)
> > 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x.....
> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
> > 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
> > 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF],
> proto
> > UDP (17), length 60)
> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!] 55396+
> > AAAA? www.osnews.com. (32)
> > 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x.....
> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
> > 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
> > 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
> UDP
> > (17), length 123)
> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!] 55396
> q:
> > A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
> > osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
> > ns1.swelter.net. (95)
> > 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@.@.<p....
> > 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
> > 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
> > 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
> > 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
> > 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
> > 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
> > 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
> UDP
> > (17), length 135)
> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!] 55396
> q:
> > AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
> > 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
> > osnews.com. [29m3s] NS ns2.swelter.net. (107)
> > 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@.@.<d....
> > 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
> > 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
> > 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
> > 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
> > 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
> > 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
> > 0x0080: 0603 6e73 32c0 4c ..ns2.L
> > This is the only DNS traffic I saw during the attempts. The tcpdumps have
> > udp bad checksum but when I disabled TFO in polipo, the UDP where still
> bad
> > checksum but they worked.
> > Really weird.
> > p.s. UPNP still works for port forwarding negotiation as it did in
> 3.6.11-4
> > I still couldn't get the UPNP/SSDP broadcasts (udp to 239.255.255.250) to
> > being forwarded between se00 and sw00/sw10. Last time it worked was
> ~3.3.8.
> > I'm starting not to question why it doesn't work, I'm starting to wonder
> why
> > it did work then ;-)
> > Regards,
> > Maciej
> > On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com> wrote:
> >>
> >> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com>
> wrote:
> >> > Sorry, could you give us a copy of the panic stack trace ?
> >>
> >> I will get a serial console up on a wndr3800 by sunday. (sorry, just
> >> landed in california, am in disarray)
> >>
> >> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
> >>
> >> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
> >>
> >> --
> >> Dave Täht
> >>
> >> Fixing bufferbloat with cerowrt:
> >> http://www.teklibre.com/cerowrt/subscribe.html
> >> _______________________________________________
> >> Cerowrt-devel mailing list
> >> Cerowrt-devel@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> >
> >
>
>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt:
> http://www.teklibre.com/cerowrt/subscribe.html
>

[-- Attachment #2: Type: text/html, Size: 10820 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-04 20:57         ` Jerry Chu
@ 2013-01-04 21:21           ` Dave Taht
  2013-01-04 21:36             ` Jerry Chu
  0 siblings, 1 reply; 37+ messages in thread
From: Dave Taht @ 2013-01-04 21:21 UTC (permalink / raw)
  To: Jerry Chu; +Cc: Eric Dumazet, cerowrt-devel, Yuchung Cheng

On Fri, Jan 4, 2013 at 12:57 PM, Jerry Chu <hkchu@google.com> wrote:
> +ycheng
>
> On Fri, Jan 4, 2013 at 12:43 PM, Maciej Soltysiak <maciej@soltysiak.com>
> wrote:
>>
>> Oops, apologies if email was formatted weirdly...
>
>
> The problem you described below is separate from the MIPS router crash one,
> right?

I think - but am of course unsure - that we've actually hit a
different (or additional) problem than TFO - merely exposed by testing
fastopen first...

that said...

>BTW, we've only tested on x86_64 arch.

I have multiple arches accumulated for some BQL work, mostly arm,

They include a raspberry pi, a zedboard, a wndr4300, a nexus 7, and a
couple other boxes in addition to my usual mips based wndr3800s and
nanostation m5s. (which use mildly different mips chips and in
particular the m5 does not share the unaligned rx problem that the
ar71xx has a patch for). The zedboard's been problematic...

So I will fold in testing tfo to looking hard at the igmp, and
checksum issues seemingly exposed today. Unless someone beats me to it
(I'm tied up til sunday)

In the interim, at a higher layers, the current release of httpping
has tfo support, and there are patches to polipo, that might benefit
from review and wider testing on x86_64.

https://raw.github.com/dtaht/ceropackages-3.3/master/net/polipo/patches/001-server_tfo.patch

At least half of the needed support landed in netperf svn recently, too.

>
> In addition to tcpdump, "netstat -s | grep -i fastopen" may be useful too.
>
> Thanks,
>
> Jerry (author of TFO server code)
>
>>
>>
>> On Fri, Jan 4, 2013 at 9:42 PM, Maciej Soltysiak <maciej@soltysiak.com>
>> wrote:
>>>
>>> I am seeing something strange here, with polipo related to TFO but also
>>> DNS.
>>> When I just took 3.7.1-1 and set my windows 7 laptop to use
>>> gw.home.lan:8123 as http proxy it didn't work. What I observed was:
>>> A) after quite a while polipo's response to browser was 504 Host
>>> www.osnews.com lookup failed: Timeout
>>> b) this error in ssh console: Host osnews.com lookup failed: Timeout
>>> (131072)
>>> c) Disabling TFO by adding option useTCPFastOpen 'false' to config
>>> 'polipo' 'general' works around the problem
>>> d) Alternatively, you can keep TFO enabled in polipo but change option
>>> 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
>>> This is very weird, because TFO is TCP and the DNS queries fired off by
>>> polipo are UDP:
>>> root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
>>> 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF],
>>> proto UDP (17), length 60)
>>> 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!] 55396+
>>> A? www.osnews.com. (32)
>>> 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x.....
>>> 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>>> 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>>> 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
>>> 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF],
>>> proto UDP (17), length 60)
>>> 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!] 55396+
>>> AAAA? www.osnews.com. (32)
>>> 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x.....
>>> 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>>> 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>>> 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
>>> 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
>>> UDP (17), length 123)
>>> 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!] 55396
>>> q: A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
>>> osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
>>> ns1.swelter.net. (95)
>>> 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@.@.<p....
>>> 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
>>> 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>>> 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
>>> 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
>>> 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
>>> 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
>>> 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
>>> 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
>>> UDP (17), length 135)
>>> 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!] 55396
>>> q: AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
>>> 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
>>> osnews.com. [29m3s] NS ns2.swelter.net. (107)
>>> 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@.@.<d....
>>> 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
>>> 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>>> 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
>>> 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
>>> 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
>>> 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
>>> 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
>>> 0x0080: 0603 6e73 32c0 4c ..ns2.L
>>> This is the only DNS traffic I saw during the attempts. The tcpdumps have
>>> udp bad checksum but when I disabled TFO in polipo, the UDP where still bad
>>> checksum but they worked.
>>> Really weird.
>>> p.s. UPNP still works for port forwarding negotiation as it did in
>>> 3.6.11-4
>>> I still couldn't get the UPNP/SSDP broadcasts (udp to 239.255.255.250) to
>>> being forwarded between se00 and sw00/sw10. Last time it worked was ~3.3.8.
>>> I'm starting not to question why it doesn't work, I'm starting to wonder why
>>> it did work then ;-)
>>> Regards,
>>> Maciej
>>> On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com> wrote:
>>>>
>>>> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com>
>>>> wrote:
>>>> > Sorry, could you give us a copy of the panic stack trace ?
>>>>
>>>> I will get a serial console up on a wndr3800 by sunday. (sorry, just
>>>> landed in california, am in disarray)
>>>>
>>>> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
>>>>
>>>> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>>>>
>>>> --
>>>> Dave Täht
>>>>
>>>> Fixing bufferbloat with cerowrt:
>>>> http://www.teklibre.com/cerowrt/subscribe.html
>>>> _______________________________________________
>>>> Cerowrt-devel mailing list
>>>> Cerowrt-devel@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>
>>>
>>
>



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-04 21:21           ` Dave Taht
@ 2013-01-04 21:36             ` Jerry Chu
  2013-01-04 21:44               ` Dave Taht
  0 siblings, 1 reply; 37+ messages in thread
From: Jerry Chu @ 2013-01-04 21:36 UTC (permalink / raw)
  To: Dave Taht; +Cc: Eric Dumazet, cerowrt-devel, Yuchung Cheng

[-- Attachment #1: Type: text/plain, Size: 7499 bytes --]

On Fri, Jan 4, 2013 at 1:21 PM, Dave Taht <dave.taht@gmail.com> wrote:

> On Fri, Jan 4, 2013 at 12:57 PM, Jerry Chu <hkchu@google.com> wrote:
> > +ycheng
> >
> > On Fri, Jan 4, 2013 at 12:43 PM, Maciej Soltysiak <maciej@soltysiak.com>
> > wrote:
> >>
> >> Oops, apologies if email was formatted weirdly...
> >
> >
> > The problem you described below is separate from the MIPS router crash
> one,
> > right?
>
> I think - but am of course unsure - that we've actually hit a
> different (or additional) problem than TFO - merely exposed by testing
> fastopen first...
>
> that said...
>
> >BTW, we've only tested on x86_64 arch.
>
> I have multiple arches accumulated for some BQL work, mostly arm,
>
> They include a raspberry pi, a zedboard, a wndr4300, a nexus 7, and a
> couple other boxes in addition to my usual mips based wndr3800s and
> nanostation m5s. (which use mildly different mips chips and in
> particular the m5 does not share the unaligned rx problem that the
> ar71xx has a patch for). The zedboard's been problematic...
>
> So I will fold in testing tfo to looking hard at the igmp, and
> checksum issues seemingly exposed today. Unless someone beats me to it
> (I'm tied up til sunday)
>
> In the interim, at a higher layers, the current release of httpping
> has tfo support, and there are patches to polipo, that might benefit
>

Awesome!


> from review and wider testing on x86_64.
>
>
> https://raw.github.com/dtaht/ceropackages-3.3/master/net/polipo/patches/001-server_tfo.patch


tfo_qlen is your first line of defense against spoofed TFO attack so you
want to pick a value wisely. (50 seems on the safer side.)

Jerry


>
> At least half of the needed support landed in netperf svn recently, too.
>
> >
> > In addition to tcpdump, "netstat -s | grep -i fastopen" may be useful
> too.
> >
> > Thanks,
> >
> > Jerry (author of TFO server code)
> >
> >>
> >>
> >> On Fri, Jan 4, 2013 at 9:42 PM, Maciej Soltysiak <maciej@soltysiak.com>
> >> wrote:
> >>>
> >>> I am seeing something strange here, with polipo related to TFO but also
> >>> DNS.
> >>> When I just took 3.7.1-1 and set my windows 7 laptop to use
> >>> gw.home.lan:8123 as http proxy it didn't work. What I observed was:
> >>> A) after quite a while polipo's response to browser was 504 Host
> >>> www.osnews.com lookup failed: Timeout
> >>> b) this error in ssh console: Host osnews.com lookup failed: Timeout
> >>> (131072)
> >>> c) Disabling TFO by adding option useTCPFastOpen 'false' to config
> >>> 'polipo' 'general' works around the problem
> >>> d) Alternatively, you can keep TFO enabled in polipo but change option
> >>> 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
> >>> This is very weird, because TFO is TCP and the DNS queries fired off by
> >>> polipo are UDP:
> >>> root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
> >>> 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF],
> >>> proto UDP (17), length 60)
> >>> 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!]
> 55396+
> >>> A? www.osnews.com. (32)
> >>> 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x.....
> >>> 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
> >>> 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
> >>> 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
> >>> 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF],
> >>> proto UDP (17), length 60)
> >>> 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!]
> 55396+
> >>> AAAA? www.osnews.com. (32)
> >>> 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x.....
> >>> 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
> >>> 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
> >>> 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
> >>> 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
> >>> UDP (17), length 123)
> >>> 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!] 55396
> >>> q: A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
> >>> osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
> >>> ns1.swelter.net. (95)
> >>> 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@.@.<p....
> >>> 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
> >>> 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
> >>> 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
> >>> 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
> >>> 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
> >>> 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
> >>> 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
> >>> 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
> >>> UDP (17), length 135)
> >>> 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!] 55396
> >>> q: AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
> >>> 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
> >>> osnews.com. [29m3s] NS ns2.swelter.net. (107)
> >>> 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@.@.<d....
> >>> 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
> >>> 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
> >>> 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
> >>> 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
> >>> 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
> >>> 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
> >>> 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
> >>> 0x0080: 0603 6e73 32c0 4c ..ns2.L
> >>> This is the only DNS traffic I saw during the attempts. The tcpdumps
> have
> >>> udp bad checksum but when I disabled TFO in polipo, the UDP where
> still bad
> >>> checksum but they worked.
> >>> Really weird.
> >>> p.s. UPNP still works for port forwarding negotiation as it did in
> >>> 3.6.11-4
> >>> I still couldn't get the UPNP/SSDP broadcasts (udp to 239.255.255.250)
> to
> >>> being forwarded between se00 and sw00/sw10. Last time it worked was
> ~3.3.8.
> >>> I'm starting not to question why it doesn't work, I'm starting to
> wonder why
> >>> it did work then ;-)
> >>> Regards,
> >>> Maciej
> >>> On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com> wrote:
> >>>>
> >>>> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com>
> >>>> wrote:
> >>>> > Sorry, could you give us a copy of the panic stack trace ?
> >>>>
> >>>> I will get a serial console up on a wndr3800 by sunday. (sorry, just
> >>>> landed in california, am in disarray)
> >>>>
> >>>> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
> >>>>
> >>>> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
> >>>>
> >>>> --
> >>>> Dave Täht
> >>>>
> >>>> Fixing bufferbloat with cerowrt:
> >>>> http://www.teklibre.com/cerowrt/subscribe.html
> >>>> _______________________________________________
> >>>> Cerowrt-devel mailing list
> >>>> Cerowrt-devel@lists.bufferbloat.net
> >>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> >>>
> >>>
> >>
> >
>
>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt:
> http://www.teklibre.com/cerowrt/subscribe.html
>

[-- Attachment #2: Type: text/html, Size: 11464 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-04 21:36             ` Jerry Chu
@ 2013-01-04 21:44               ` Dave Taht
  0 siblings, 0 replies; 37+ messages in thread
From: Dave Taht @ 2013-01-04 21:44 UTC (permalink / raw)
  To: Jerry Chu; +Cc: Eric Dumazet, cerowrt-devel, Yuchung Cheng

On Fri, Jan 4, 2013 at 1:36 PM, Jerry Chu <hkchu@google.com> wrote:
> On Fri, Jan 4, 2013 at 1:21 PM, Dave Taht <dave.taht@gmail.com> wrote:
>>
>> On Fri, Jan 4, 2013 at 12:57 PM, Jerry Chu <hkchu@google.com> wrote:
>> > +ycheng
>> >
>> > On Fri, Jan 4, 2013 at 12:43 PM, Maciej Soltysiak <maciej@soltysiak.com>
>> > wrote:
>> >>
>> >> Oops, apologies if email was formatted weirdly...
>> >
>> >
>> > The problem you described below is separate from the MIPS router crash
>> > one,
>> > right?
>>
>> I think - but am of course unsure - that we've actually hit a
>> different (or additional) problem than TFO - merely exposed by testing
>> fastopen first...
>>
>> that said...
>>
>> >BTW, we've only tested on x86_64 arch.
>>
>> I have multiple arches accumulated for some BQL work, mostly arm,
>>
>> They include a raspberry pi, a zedboard, a wndr4300, a nexus 7, and a
>> couple other boxes in addition to my usual mips based wndr3800s and
>> nanostation m5s. (which use mildly different mips chips and in
>> particular the m5 does not share the unaligned rx problem that the
>> ar71xx has a patch for). The zedboard's been problematic...
>>
>> So I will fold in testing tfo to looking hard at the igmp, and
>> checksum issues seemingly exposed today. Unless someone beats me to it
>> (I'm tied up til sunday)
>>
>> In the interim, at a higher layers, the current release of httpping
>> has tfo support, and there are patches to polipo, that might benefit
>
>
> Awesome!
>
>>
>> from review and wider testing on x86_64.
>>
>>
>> https://raw.github.com/dtaht/ceropackages-3.3/master/net/polipo/patches/001-server_tfo.patch
>
>
> tfo_qlen is your first line of defense against spoofed TFO attack so you
> want to pick a value wisely. (50 seems on the safer side.)

I would like it if TFO client support landed in polipo too, which
looked both tricky to implement, and useful as a means of getting more
TFO support "out there".

Early testing with httping was quite promising over wifi. Eliminating
an RTT there basically doubled throughput for short local aggregates.

http://www.vanheusden.com/httping/ (it's the -F option)

Are there any other test tools available?

>
> Jerry
>
>>
>>
>> At least half of the needed support landed in netperf svn recently, too.
>>
>> >
>> > In addition to tcpdump, "netstat -s | grep -i fastopen" may be useful
>> > too.
>> >
>> > Thanks,
>> >
>> > Jerry (author of TFO server code)
>> >
>> >>
>> >>
>> >> On Fri, Jan 4, 2013 at 9:42 PM, Maciej Soltysiak <maciej@soltysiak.com>
>> >> wrote:
>> >>>
>> >>> I am seeing something strange here, with polipo related to TFO but
>> >>> also
>> >>> DNS.
>> >>> When I just took 3.7.1-1 and set my windows 7 laptop to use
>> >>> gw.home.lan:8123 as http proxy it didn't work. What I observed was:
>> >>> A) after quite a while polipo's response to browser was 504 Host
>> >>> www.osnews.com lookup failed: Timeout
>> >>> b) this error in ssh console: Host osnews.com lookup failed: Timeout
>> >>> (131072)
>> >>> c) Disabling TFO by adding option useTCPFastOpen 'false' to config
>> >>> 'polipo' 'general' works around the problem
>> >>> d) Alternatively, you can keep TFO enabled in polipo but change option
>> >>> 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
>> >>> This is very weird, because TFO is TCP and the DNS queries fired off
>> >>> by
>> >>> polipo are UDP:
>> >>> root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
>> >>> 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF],
>> >>> proto UDP (17), length 60)
>> >>> 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!]
>> >>> 55396+
>> >>> A? www.osnews.com. (32)
>> >>> 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x.....
>> >>> 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>> >>> 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>> >>> 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
>> >>> 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF],
>> >>> proto UDP (17), length 60)
>> >>> 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!]
>> >>> 55396+
>> >>> AAAA? www.osnews.com. (32)
>> >>> 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x.....
>> >>> 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>> >>> 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>> >>> 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
>> >>> 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
>> >>> UDP (17), length 123)
>> >>> 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!]
>> >>> 55396
>> >>> q: A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
>> >>> osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
>> >>> ns1.swelter.net. (95)
>> >>> 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@.@.<p....
>> >>> 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
>> >>> 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>> >>> 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
>> >>> 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
>> >>> 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
>> >>> 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
>> >>> 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
>> >>> 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
>> >>> UDP (17), length 135)
>> >>> 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!]
>> >>> 55396
>> >>> q: AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
>> >>> 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
>> >>> osnews.com. [29m3s] NS ns2.swelter.net. (107)
>> >>> 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@.@.<d....
>> >>> 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
>> >>> 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>> >>> 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
>> >>> 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
>> >>> 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
>> >>> 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
>> >>> 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
>> >>> 0x0080: 0603 6e73 32c0 4c ..ns2.L
>> >>> This is the only DNS traffic I saw during the attempts. The tcpdumps
>> >>> have
>> >>> udp bad checksum but when I disabled TFO in polipo, the UDP where
>> >>> still bad
>> >>> checksum but they worked.
>> >>> Really weird.
>> >>> p.s. UPNP still works for port forwarding negotiation as it did in
>> >>> 3.6.11-4
>> >>> I still couldn't get the UPNP/SSDP broadcasts (udp to 239.255.255.250)
>> >>> to
>> >>> being forwarded between se00 and sw00/sw10. Last time it worked was
>> >>> ~3.3.8.
>> >>> I'm starting not to question why it doesn't work, I'm starting to
>> >>> wonder why
>> >>> it did work then ;-)
>> >>> Regards,
>> >>> Maciej
>> >>> On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com> wrote:
>> >>>>
>> >>>> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com>
>> >>>> wrote:
>> >>>> > Sorry, could you give us a copy of the panic stack trace ?
>> >>>>
>> >>>> I will get a serial console up on a wndr3800 by sunday. (sorry, just
>> >>>> landed in california, am in disarray)
>> >>>>
>> >>>> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
>> >>>>
>> >>>> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>> >>>>
>> >>>> --
>> >>>> Dave Täht
>> >>>>
>> >>>> Fixing bufferbloat with cerowrt:
>> >>>> http://www.teklibre.com/cerowrt/subscribe.html
>> >>>> _______________________________________________
>> >>>> Cerowrt-devel mailing list
>> >>>> Cerowrt-devel@lists.bufferbloat.net
>> >>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>> >>>
>> >>>
>> >>
>> >
>>
>>
>>
>> --
>> Dave Täht
>>
>> Fixing bufferbloat with cerowrt:
>> http://www.teklibre.com/cerowrt/subscribe.html
>
>



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-04 20:42     ` Maciej Soltysiak
  2013-01-04 20:43       ` Maciej Soltysiak
  2013-01-04 21:11       ` Dave Taht
@ 2013-01-04 22:25       ` Robert Bradley
  2013-01-14  6:11       ` Dave Taht
  3 siblings, 0 replies; 37+ messages in thread
From: Robert Bradley @ 2013-01-04 22:25 UTC (permalink / raw)
  To: cerowrt-devel

On 04/01/13 20:42, Maciej Soltysiak wrote:
> This is very weird, because TFO is TCP and the DNS queries fired off by
> polipo are UDP:
> root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
<snip>
> This is the only DNS traffic I saw during the attempts. The tcpdumps have
> udp bad checksum but when I disabled TFO in polipo, the UDP where still bad
> checksum but they worked.
> Really weird.

I think this is a side effect of the loopback interface not processing 
checksums by default.  I will be more worried if the checksum errors 
also occur on ge00, but am not ruling out problems with the unaligned 
access patches yet.

As an aside, I've noticed very occasional issues connecting to 
(IPv4-only) sites using both Sugarland and the 3.6.10 release, which 
gets fixed when the connection is retried.  I have been unable to repeat 
this on demand, however.

-- 
Robert Bradley


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-04 21:01         ` dpreed
@ 2013-01-04 22:49           ` Robert Bradley
  0 siblings, 0 replies; 37+ messages in thread
From: Robert Bradley @ 2013-01-04 22:49 UTC (permalink / raw)
  To: cerowrt-devel

On 04/01/13 21:01, dpreed@reed.com wrote:
> Is this a TFO where the endpoint is on cerowrt, or just a SYN+DATA for a non cerowrt destination?
>   
> I was looking at the firewall rules, and they are pretty complicated.  Perhaps the SYN+DATA triggers a strange firewall behavior (a loop?)   SYN's are special to firewalls, as we know.
>

I'm wondering if it could be an issue with the NAT or 
connection-tracking code somehow not coping with the TFO options.

-- 
Robert Bradley


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-04 21:19         ` Jerry Chu
@ 2013-01-05  1:59           ` Ketan Kulkarni
  2013-01-05  2:20             ` Yuchung Cheng
  0 siblings, 1 reply; 37+ messages in thread
From: Ketan Kulkarni @ 2013-01-05  1:59 UTC (permalink / raw)
  To: Jerry Chu, Dave Taht, Yuchung Cheng; +Cc: Eric Dumazet, cerowrt-devel


[-- Attachment #1.1: Type: text/plain, Size: 8565 bytes --]

Well, I was trying polipo server on cero box and httping from laptop. On
both the boxes I set 3 in tcp_fastopen.

The panic is seen only when server is on cero box.
If I run server on my laptop and httping from cero all TFO connections are
successful.
So I doubt its the only problem is SYN+DATA.

Unfortunately I don't have the serial cable right now, and logread or dmesg
didn't print any logs before the cero router  restarted.

Attached is the tcpdump capture on lo when client and server both run on
cero box.
HTH!

If you (or anyone) can suggest more diagnostics, I will be glad to provide.
 On Jan 5, 2013 2:49 AM, "Jerry Chu" <hkchu@google.com> wrote:

> +ycheng
>
>
> On Fri, Jan 4, 2013 at 1:11 PM, Dave Taht <dave.taht@gmail.com> wrote:
>
>> Hmm. I would lean towards there being an issue with the new (freshly
>> ported forward to 3.7.1) unaligned checksum code for mips based on
>> what you say here. Or an offload...
>>
>> As for the 239.x multicast issue, hmm... separate issue entirely.
>> Probably...
>>
>> And then there's TFO. I note that in order to use it properly you need
>> to turn it on in proc. Last I remember that was
>>
>> echo 3 > /proc/sys/net/ipv4/tcp_fastopen
>>
>
> Correct - to enable the normal use of TFO for both client and server.
> There are other flags for advanced usage:
>  /* Bit Flags for sysctl_tcp_fastopen */
> #define TFO_CLIENT_ENABLE       1
> #define TFO_SERVER_ENABLE       2
> #define TFO_CLIENT_NO_COOKIE    4 /* Send data-in-SYN w/o cookie */
>
> /* Process SYN data but skip cookie validation */
> #define TFO_SERVER_COOKIE_NOT_CHKED     0x100
> /* Accept SYN data w/o any cookie option */
> #define TFO_SERVER_COOKIE_NOT_REQD      0x200
>
> /* Force enable TFO on all listeners, i.e., not requiring the
>  * TCP_FASTOPEN socket option. SOCKOPT1/2 determine how to set max_qlen.
>  */
> #define TFO_SERVER_WO_SOCKOPT1  0x400
> #define TFO_SERVER_WO_SOCKOPT2  0x800
> /* Always create TFO child sockets on a TFO listener even when
>  * cookie/data not present. (For testing purpose!)
>  */
> #define TFO_SERVER_ALWAYS       0x1000
>
>
>> However that's an old memory and there is this tcp_fastopen_key file I
>> don't know anything about yet (this is such bleeding edge stuff!)
>>
>> ... and with tcp_fastopen disabled things should still work right...
>> so I'm thinking something else is busted in the stack.
>>
>> I've also observed a dns slowdown in what I've been testing but hadn't
>> dug into packet dumps. (and was assuming, until now, it was due to me
>> fiddling with ULAs inside the network) Thanks for digging this deep!
>>
>> I never said this first attempt at 3.7 for cero was going to be
>> perfect, but we've entered a new age of subtle problems here.
>>
>> I strongly suggest nobody else try this dev build as a default gw, and
>> that the TFO folk ignore the noise for now.
>>
>
> SG.
>
> Jerry
>
>
>>
>> I just got a 3.7.1 box built on x86_64 so as to a/b some captures.
>> Regrettably I'm short on time through the weekend...
>>
>> On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak <maciej@soltysiak.com>
>> wrote:
>> > I am seeing something strange here, with polipo related to TFO but also
>> DNS.
>> > When I just took 3.7.1-1 and set my windows 7 laptop to use
>> gw.home.lan:8123
>> > as http proxy it didn't work. What I observed was:
>> > A) after quite a while polipo's response to browser was 504 Host
>> > www.osnews.com lookup failed: Timeout
>> > b) this error in ssh console: Host osnews.com lookup failed: Timeout
>> > (131072)
>> > c) Disabling TFO by adding option useTCPFastOpen 'false' to config
>> 'polipo'
>> > 'general' works around the problem
>> > d) Alternatively, you can keep TFO enabled in polipo but change option
>> > 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
>> > This is very weird, because TFO is TCP and the DNS queries fired off by
>> > polipo are UDP:
>> > root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
>> > 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF],
>> proto
>> > UDP (17), length 60)
>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!]
>> 55396+ A?
>> > www.osnews.com. (32)
>> > 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x.....
>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>> > 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
>> > 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF],
>> proto
>> > UDP (17), length 60)
>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!] 55396+
>> > AAAA? www.osnews.com. (32)
>> > 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x.....
>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>> > 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
>> > 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
>> UDP
>> > (17), length 123)
>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!] 55396
>> q:
>> > A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
>> > osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
>> > ns1.swelter.net. (95)
>> > 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@.@.<p....
>> > 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>> > 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
>> > 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
>> > 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
>> > 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
>> > 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
>> > 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
>> UDP
>> > (17), length 135)
>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!] 55396
>> q:
>> > AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
>> > 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
>> > osnews.com. [29m3s] NS ns2.swelter.net. (107)
>> > 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@.@.<d....
>> > 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>> > 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
>> > 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
>> > 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
>> > 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
>> > 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
>> > 0x0080: 0603 6e73 32c0 4c ..ns2.L
>> > This is the only DNS traffic I saw during the attempts. The tcpdumps
>> have
>> > udp bad checksum but when I disabled TFO in polipo, the UDP where still
>> bad
>> > checksum but they worked.
>> > Really weird.
>> > p.s. UPNP still works for port forwarding negotiation as it did in
>> 3.6.11-4
>> > I still couldn't get the UPNP/SSDP broadcasts (udp to 239.255.255.250)
>> to
>> > being forwarded between se00 and sw00/sw10. Last time it worked was
>> ~3.3.8.
>> > I'm starting not to question why it doesn't work, I'm starting to
>> wonder why
>> > it did work then ;-)
>> > Regards,
>> > Maciej
>> > On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com> wrote:
>> >>
>> >> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com>
>> wrote:
>> >> > Sorry, could you give us a copy of the panic stack trace ?
>> >>
>> >> I will get a serial console up on a wndr3800 by sunday. (sorry, just
>> >> landed in california, am in disarray)
>> >>
>> >> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
>> >>
>> >> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>> >>
>> >> --
>> >> Dave Täht
>> >>
>> >> Fixing bufferbloat with cerowrt:
>> >> http://www.teklibre.com/cerowrt/subscribe.html
>> >> _______________________________________________
>> >> Cerowrt-devel mailing list
>> >> Cerowrt-devel@lists.bufferbloat.net
>> >> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>> >
>> >
>>
>>
>>
>> --
>> Dave Täht
>>
>> Fixing bufferbloat with cerowrt:
>> http://www.teklibre.com/cerowrt/subscribe.html
>>
>
>

[-- Attachment #1.2: Type: text/html, Size: 11792 bytes --]

[-- Attachment #2: lo_capture.txt --]
[-- Type: text/plain, Size: 1944 bytes --]

root@OpenWrt:~# httping -F -g http://127.0.0.1:8123
PING 127.0.0.1:8123 (http://127.0.0.1:8123):
16:36:36.033466 IP localhost.39443 > localhost.8123: Flags [SEW], seq 2946288341, win 43690, options [mss 65495,sackOK,TS val 4294964893 ecr 0,nop,wscale 6,Unknown Option 254f989], length 0 --> SYN + Coockie and no Data.
connected to 127.0.0.1:8123 (183 bytes), seq=0 time=3.15 ms 
16:36:36.033584 IP localhost.8123 > localhost.39443: Flags [S.E], seq 654941876, ack 2946288342, win 43690, options [mss 65495,sackOK,TS val 4294964893 ecr 4294964893,nop,wscale 6,Unknown Option 254f989df087214939732ef], length 0  --> SYN+ACK+Cookie
16:36:36.033638 IP localhost.39443 > localhost.8123: Flags [.], ack 1, win 683, options [nop,nop,TS val 4294964893 ecr 4294964893], length 0
16:36:36.034971 IP localhost.39443 > localhost.8123: Flags [P.], seq 1:65, ack 1, win 683, options [nop,nop,TS val 4294964894 ecr 4294964893], length 64 --> HTTP Request HEAD
16:36:36.035112 IP localhost.8123 > localhost.39443: Flags [.], ack 65, win 683, options [nop,nop,TS val 4294964894 ecr 4294964894], length 0
16:36:36.035808 IP localhost.8123 > localhost.39443: Flags [P.], seq 1:184, ack 65, win 683, options [nop,nop,TS val 4294964894 ecr 4294964894], length 183  --> HTTP Response
16:36:36.035965 IP localhost.8123 > localhost.39443: Flags [F.], seq 184, ack 65, win 683, options [nop,nop,TS val 4294964894 ecr 4294964894], length 0
16:36:36.036082 IP localhost.39443 > localhost.8123: Flags [.], ack 184, win 700, options [nop,nop,TS val 4294964894 ecr 4294964894], length 0
16:36:36.036350 IP localhost.39443 > localhost.8123: Flags [F.], seq 65, ack 185, win 700, options [nop,nop,TS val 4294964894 ecr 4294964894], length 0
16:36:36.036462 IP localhost.8123 > localhost.39443: Flags [.], ack 66, win 683, options [nop,nop,TS val 4294964894 ecr 4294964894], length 0  --> Connection Completes

### No packets seen later - Probably SYN+Data Crashed the box ###

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-05  1:59           ` Ketan Kulkarni
@ 2013-01-05  2:20             ` Yuchung Cheng
  2013-01-05  3:02               ` Ketan Kulkarni
  0 siblings, 1 reply; 37+ messages in thread
From: Yuchung Cheng @ 2013-01-05  2:20 UTC (permalink / raw)
  To: Ketan Kulkarni; +Cc: Jerry Chu, Eric Dumazet, cerowrt-devel

On Fri, Jan 4, 2013 at 5:59 PM, Ketan Kulkarni <ketkulka@gmail.com> wrote:
> Well, I was trying polipo server on cero box and httping from laptop. On
> both the boxes I set 3 in tcp_fastopen.
>
> The panic is seen only when server is on cero box.
> If I run server on my laptop and httping from cero all TFO connections are
> successful.
> So I doubt its the only problem is SYN+DATA.
Just to confirm: you meant the problem is SYN/data processing on the
server side?

Maybe we hit some ECN / TFO bug. Some crash log would be great. Thanks
for trying TFO!

>
> Unfortunately I don't have the serial cable right now, and logread or dmesg
> didn't print any logs before the cero router  restarted.
>
> Attached is the tcpdump capture on lo when client and server both run on
> cero box.
> HTH!
>
> If you (or anyone) can suggest more diagnostics, I will be glad to provide.
>
> On Jan 5, 2013 2:49 AM, "Jerry Chu" <hkchu@google.com> wrote:
>>
>> +ycheng
>>
>>
>> On Fri, Jan 4, 2013 at 1:11 PM, Dave Taht <dave.taht@gmail.com> wrote:
>>>
>>> Hmm. I would lean towards there being an issue with the new (freshly
>>> ported forward to 3.7.1) unaligned checksum code for mips based on
>>> what you say here. Or an offload...
>>>
>>> As for the 239.x multicast issue, hmm... separate issue entirely.
>>> Probably...
>>>
>>> And then there's TFO. I note that in order to use it properly you need
>>> to turn it on in proc. Last I remember that was
>>>
>>> echo 3 > /proc/sys/net/ipv4/tcp_fastopen
>>
>>
>> Correct - to enable the normal use of TFO for both client and server.
>> There are other flags for advanced usage:
>>  /* Bit Flags for sysctl_tcp_fastopen */
>> #define TFO_CLIENT_ENABLE       1
>> #define TFO_SERVER_ENABLE       2
>> #define TFO_CLIENT_NO_COOKIE    4 /* Send data-in-SYN w/o cookie */
>>
>> /* Process SYN data but skip cookie validation */
>> #define TFO_SERVER_COOKIE_NOT_CHKED     0x100
>> /* Accept SYN data w/o any cookie option */
>> #define TFO_SERVER_COOKIE_NOT_REQD      0x200
>>
>> /* Force enable TFO on all listeners, i.e., not requiring the
>>  * TCP_FASTOPEN socket option. SOCKOPT1/2 determine how to set max_qlen.
>>  */
>> #define TFO_SERVER_WO_SOCKOPT1  0x400
>> #define TFO_SERVER_WO_SOCKOPT2  0x800
>> /* Always create TFO child sockets on a TFO listener even when
>>  * cookie/data not present. (For testing purpose!)
>>  */
>> #define TFO_SERVER_ALWAYS       0x1000
>>
>>>
>>> However that's an old memory and there is this tcp_fastopen_key file I
>>> don't know anything about yet (this is such bleeding edge stuff!)
>>>
>>> ... and with tcp_fastopen disabled things should still work right...
>>> so I'm thinking something else is busted in the stack.
>>>
>>> I've also observed a dns slowdown in what I've been testing but hadn't
>>> dug into packet dumps. (and was assuming, until now, it was due to me
>>> fiddling with ULAs inside the network) Thanks for digging this deep!
>>>
>>> I never said this first attempt at 3.7 for cero was going to be
>>> perfect, but we've entered a new age of subtle problems here.
>>>
>>> I strongly suggest nobody else try this dev build as a default gw, and
>>> that the TFO folk ignore the noise for now.
>>
>>
>> SG.
>>
>> Jerry
>>
>>>
>>>
>>> I just got a 3.7.1 box built on x86_64 so as to a/b some captures.
>>> Regrettably I'm short on time through the weekend...
>>>
>>> On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak <maciej@soltysiak.com>
>>> wrote:
>>> > I am seeing something strange here, with polipo related to TFO but also
>>> > DNS.
>>> > When I just took 3.7.1-1 and set my windows 7 laptop to use
>>> > gw.home.lan:8123
>>> > as http proxy it didn't work. What I observed was:
>>> > A) after quite a while polipo's response to browser was 504 Host
>>> > www.osnews.com lookup failed: Timeout
>>> > b) this error in ssh console: Host osnews.com lookup failed: Timeout
>>> > (131072)
>>> > c) Disabling TFO by adding option useTCPFastOpen 'false' to config
>>> > 'polipo'
>>> > 'general' works around the problem
>>> > d) Alternatively, you can keep TFO enabled in polipo but change option
>>> > 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
>>> > This is very weird, because TFO is TCP and the DNS queries fired off by
>>> > polipo are UDP:
>>> > root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
>>> > 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF],
>>> > proto
>>> > UDP (17), length 60)
>>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!]
>>> > 55396+ A?
>>> > www.osnews.com. (32)
>>> > 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x.....
>>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>>> > 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
>>> > 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF],
>>> > proto
>>> > UDP (17), length 60)
>>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!]
>>> > 55396+
>>> > AAAA? www.osnews.com. (32)
>>> > 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x.....
>>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>>> > 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
>>> > 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
>>> > UDP
>>> > (17), length 123)
>>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!] 55396
>>> > q:
>>> > A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
>>> > osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
>>> > ns1.swelter.net. (95)
>>> > 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@.@.<p....
>>> > 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
>>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>>> > 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
>>> > 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
>>> > 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
>>> > 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
>>> > 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
>>> > 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
>>> > UDP
>>> > (17), length 135)
>>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!] 55396
>>> > q:
>>> > AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
>>> > 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
>>> > osnews.com. [29m3s] NS ns2.swelter.net. (107)
>>> > 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@.@.<d....
>>> > 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
>>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>>> > 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
>>> > 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
>>> > 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
>>> > 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
>>> > 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
>>> > 0x0080: 0603 6e73 32c0 4c ..ns2.L
>>> > This is the only DNS traffic I saw during the attempts. The tcpdumps
>>> > have
>>> > udp bad checksum but when I disabled TFO in polipo, the UDP where still
>>> > bad
>>> > checksum but they worked.
>>> > Really weird.
>>> > p.s. UPNP still works for port forwarding negotiation as it did in
>>> > 3.6.11-4
>>> > I still couldn't get the UPNP/SSDP broadcasts (udp to 239.255.255.250)
>>> > to
>>> > being forwarded between se00 and sw00/sw10. Last time it worked was
>>> > ~3.3.8.
>>> > I'm starting not to question why it doesn't work, I'm starting to
>>> > wonder why
>>> > it did work then ;-)
>>> > Regards,
>>> > Maciej
>>> > On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com> wrote:
>>> >>
>>> >> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com>
>>> >> wrote:
>>> >> > Sorry, could you give us a copy of the panic stack trace ?
>>> >>
>>> >> I will get a serial console up on a wndr3800 by sunday. (sorry, just
>>> >> landed in california, am in disarray)
>>> >>
>>> >> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
>>> >>
>>> >> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>>> >>
>>> >> --
>>> >> Dave Täht
>>> >>
>>> >> Fixing bufferbloat with cerowrt:
>>> >> http://www.teklibre.com/cerowrt/subscribe.html
>>> >> _______________________________________________
>>> >> Cerowrt-devel mailing list
>>> >> Cerowrt-devel@lists.bufferbloat.net
>>> >> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Dave Täht
>>>
>>> Fixing bufferbloat with cerowrt:
>>> http://www.teklibre.com/cerowrt/subscribe.html
>>
>>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-05  2:20             ` Yuchung Cheng
@ 2013-01-05  3:02               ` Ketan Kulkarni
  2013-01-05  3:16                 ` Eric Dumazet
                                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Ketan Kulkarni @ 2013-01-05  3:02 UTC (permalink / raw)
  To: Yuchung Cheng; +Cc: Jerry Chu, Eric Dumazet, cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 9833 bytes --]

Without TFO all worked fine.
The problem is when tfo server is on cero box.
I will try both ECN on on laptop and disabling ECN on cero with TFO on.
Will report the behavior seen.

Thanks,
Ketan.
On Jan 5, 2013 7:50 AM, "Yuchung Cheng" <ycheng@google.com> wrote:

> On Fri, Jan 4, 2013 at 5:59 PM, Ketan Kulkarni <ketkulka@gmail.com> wrote:
> > Well, I was trying polipo server on cero box and httping from laptop. On
> > both the boxes I set 3 in tcp_fastopen.
> >
> > The panic is seen only when server is on cero box.
> > If I run server on my laptop and httping from cero all TFO connections
> are
> > successful.
> > So I doubt its the only problem is SYN+DATA.
> Just to confirm: you meant the problem is SYN/data processing on the
> server side?
>
> Maybe we hit some ECN / TFO bug. Some crash log would be great. Thanks
> for trying TFO!
>
> >
> > Unfortunately I don't have the serial cable right now, and logread or
> dmesg
> > didn't print any logs before the cero router  restarted.
> >
> > Attached is the tcpdump capture on lo when client and server both run on
> > cero box.
> > HTH!
> >
> > If you (or anyone) can suggest more diagnostics, I will be glad to
> provide.
> >
> > On Jan 5, 2013 2:49 AM, "Jerry Chu" <hkchu@google.com> wrote:
> >>
> >> +ycheng
> >>
> >>
> >> On Fri, Jan 4, 2013 at 1:11 PM, Dave Taht <dave.taht@gmail.com> wrote:
> >>>
> >>> Hmm. I would lean towards there being an issue with the new (freshly
> >>> ported forward to 3.7.1) unaligned checksum code for mips based on
> >>> what you say here. Or an offload...
> >>>
> >>> As for the 239.x multicast issue, hmm... separate issue entirely.
> >>> Probably...
> >>>
> >>> And then there's TFO. I note that in order to use it properly you need
> >>> to turn it on in proc. Last I remember that was
> >>>
> >>> echo 3 > /proc/sys/net/ipv4/tcp_fastopen
> >>
> >>
> >> Correct - to enable the normal use of TFO for both client and server.
> >> There are other flags for advanced usage:
> >>  /* Bit Flags for sysctl_tcp_fastopen */
> >> #define TFO_CLIENT_ENABLE       1
> >> #define TFO_SERVER_ENABLE       2
> >> #define TFO_CLIENT_NO_COOKIE    4 /* Send data-in-SYN w/o cookie */
> >>
> >> /* Process SYN data but skip cookie validation */
> >> #define TFO_SERVER_COOKIE_NOT_CHKED     0x100
> >> /* Accept SYN data w/o any cookie option */
> >> #define TFO_SERVER_COOKIE_NOT_REQD      0x200
> >>
> >> /* Force enable TFO on all listeners, i.e., not requiring the
> >>  * TCP_FASTOPEN socket option. SOCKOPT1/2 determine how to set max_qlen.
> >>  */
> >> #define TFO_SERVER_WO_SOCKOPT1  0x400
> >> #define TFO_SERVER_WO_SOCKOPT2  0x800
> >> /* Always create TFO child sockets on a TFO listener even when
> >>  * cookie/data not present. (For testing purpose!)
> >>  */
> >> #define TFO_SERVER_ALWAYS       0x1000
> >>
> >>>
> >>> However that's an old memory and there is this tcp_fastopen_key file I
> >>> don't know anything about yet (this is such bleeding edge stuff!)
> >>>
> >>> ... and with tcp_fastopen disabled things should still work right...
> >>> so I'm thinking something else is busted in the stack.
> >>>
> >>> I've also observed a dns slowdown in what I've been testing but hadn't
> >>> dug into packet dumps. (and was assuming, until now, it was due to me
> >>> fiddling with ULAs inside the network) Thanks for digging this deep!
> >>>
> >>> I never said this first attempt at 3.7 for cero was going to be
> >>> perfect, but we've entered a new age of subtle problems here.
> >>>
> >>> I strongly suggest nobody else try this dev build as a default gw, and
> >>> that the TFO folk ignore the noise for now.
> >>
> >>
> >> SG.
> >>
> >> Jerry
> >>
> >>>
> >>>
> >>> I just got a 3.7.1 box built on x86_64 so as to a/b some captures.
> >>> Regrettably I'm short on time through the weekend...
> >>>
> >>> On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak <
> maciej@soltysiak.com>
> >>> wrote:
> >>> > I am seeing something strange here, with polipo related to TFO but
> also
> >>> > DNS.
> >>> > When I just took 3.7.1-1 and set my windows 7 laptop to use
> >>> > gw.home.lan:8123
> >>> > as http proxy it didn't work. What I observed was:
> >>> > A) after quite a while polipo's response to browser was 504 Host
> >>> > www.osnews.com lookup failed: Timeout
> >>> > b) this error in ssh console: Host osnews.com lookup failed: Timeout
> >>> > (131072)
> >>> > c) Disabling TFO by adding option useTCPFastOpen 'false' to config
> >>> > 'polipo'
> >>> > 'general' works around the problem
> >>> > d) Alternatively, you can keep TFO enabled in polipo but change
> option
> >>> > 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
> >>> > This is very weird, because TFO is TCP and the DNS queries fired off
> by
> >>> > polipo are UDP:
> >>> > root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
> >>> > 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF],
> >>> > proto
> >>> > UDP (17), length 60)
> >>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!]
> >>> > 55396+ A?
> >>> > www.osnews.com. (32)
> >>> > 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x.....
> >>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
> >>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
> >>> > 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
> >>> > 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF],
> >>> > proto
> >>> > UDP (17), length 60)
> >>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!]
> >>> > 55396+
> >>> > AAAA? www.osnews.com. (32)
> >>> > 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x.....
> >>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
> >>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
> >>> > 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
> >>> > 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
> proto
> >>> > UDP
> >>> > (17), length 123)
> >>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!]
> 55396
> >>> > q:
> >>> > A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
> >>> > osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
> >>> > ns1.swelter.net. (95)
> >>> > 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@.@.<p....
> >>> > 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
> >>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
> >>> > 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
> >>> > 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
> >>> > 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
> >>> > 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
> >>> > 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
> >>> > 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
> proto
> >>> > UDP
> >>> > (17), length 135)
> >>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!]
> 55396
> >>> > q:
> >>> > AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
> >>> > 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
> >>> > osnews.com. [29m3s] NS ns2.swelter.net. (107)
> >>> > 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@.@.<d....
> >>> > 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
> >>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
> >>> > 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
> >>> > 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
> >>> > 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
> >>> > 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
> >>> > 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
> >>> > 0x0080: 0603 6e73 32c0 4c ..ns2.L
> >>> > This is the only DNS traffic I saw during the attempts. The tcpdumps
> >>> > have
> >>> > udp bad checksum but when I disabled TFO in polipo, the UDP where
> still
> >>> > bad
> >>> > checksum but they worked.
> >>> > Really weird.
> >>> > p.s. UPNP still works for port forwarding negotiation as it did in
> >>> > 3.6.11-4
> >>> > I still couldn't get the UPNP/SSDP broadcasts (udp to
> 239.255.255.250)
> >>> > to
> >>> > being forwarded between se00 and sw00/sw10. Last time it worked was
> >>> > ~3.3.8.
> >>> > I'm starting not to question why it doesn't work, I'm starting to
> >>> > wonder why
> >>> > it did work then ;-)
> >>> > Regards,
> >>> > Maciej
> >>> > On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com>
> wrote:
> >>> >>
> >>> >> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com>
> >>> >> wrote:
> >>> >> > Sorry, could you give us a copy of the panic stack trace ?
> >>> >>
> >>> >> I will get a serial console up on a wndr3800 by sunday. (sorry, just
> >>> >> landed in california, am in disarray)
> >>> >>
> >>> >> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
> >>> >>
> >>> >> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
> >>> >>
> >>> >> --
> >>> >> Dave Täht
> >>> >>
> >>> >> Fixing bufferbloat with cerowrt:
> >>> >> http://www.teklibre.com/cerowrt/subscribe.html
> >>> >> _______________________________________________
> >>> >> Cerowrt-devel mailing list
> >>> >> Cerowrt-devel@lists.bufferbloat.net
> >>> >> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Dave Täht
> >>>
> >>> Fixing bufferbloat with cerowrt:
> >>> http://www.teklibre.com/cerowrt/subscribe.html
> >>
> >>
> >
>

[-- Attachment #2: Type: text/html, Size: 14188 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-05  3:02               ` Ketan Kulkarni
@ 2013-01-05  3:16                 ` Eric Dumazet
  2013-01-05  3:35                 ` Dave Taht
  2013-01-05 19:13                 ` Ketan Kulkarni
  2 siblings, 0 replies; 37+ messages in thread
From: Eric Dumazet @ 2013-01-05  3:16 UTC (permalink / raw)
  To: Ketan Kulkarni; +Cc: Jerry Chu, Yuchung Cheng, cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 10462 bytes --]

/* TCP Fast Open Cookie as stored in memory */
struct tcp_fastopen_cookie {
        s8      len;
        u8      val[TCP_FASTOPEN_COOKIE_MAX];
};

I wonder if 's8' really does what we want on all arches.

We want to store a negative 8bit number, not an unsigned one...



On Fri, Jan 4, 2013 at 7:02 PM, Ketan Kulkarni <ketkulka@gmail.com> wrote:

> Without TFO all worked fine.
> The problem is when tfo server is on cero box.
> I will try both ECN on on laptop and disabling ECN on cero with TFO on.
> Will report the behavior seen.
>
> Thanks,
> Ketan.
> On Jan 5, 2013 7:50 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
>
>> On Fri, Jan 4, 2013 at 5:59 PM, Ketan Kulkarni <ketkulka@gmail.com>
>> wrote:
>> > Well, I was trying polipo server on cero box and httping from laptop. On
>> > both the boxes I set 3 in tcp_fastopen.
>> >
>> > The panic is seen only when server is on cero box.
>> > If I run server on my laptop and httping from cero all TFO connections
>> are
>> > successful.
>> > So I doubt its the only problem is SYN+DATA.
>> Just to confirm: you meant the problem is SYN/data processing on the
>> server side?
>>
>> Maybe we hit some ECN / TFO bug. Some crash log would be great. Thanks
>> for trying TFO!
>>
>> >
>> > Unfortunately I don't have the serial cable right now, and logread or
>> dmesg
>> > didn't print any logs before the cero router  restarted.
>> >
>> > Attached is the tcpdump capture on lo when client and server both run on
>> > cero box.
>> > HTH!
>> >
>> > If you (or anyone) can suggest more diagnostics, I will be glad to
>> provide.
>> >
>> > On Jan 5, 2013 2:49 AM, "Jerry Chu" <hkchu@google.com> wrote:
>> >>
>> >> +ycheng
>> >>
>> >>
>> >> On Fri, Jan 4, 2013 at 1:11 PM, Dave Taht <dave.taht@gmail.com> wrote:
>> >>>
>> >>> Hmm. I would lean towards there being an issue with the new (freshly
>> >>> ported forward to 3.7.1) unaligned checksum code for mips based on
>> >>> what you say here. Or an offload...
>> >>>
>> >>> As for the 239.x multicast issue, hmm... separate issue entirely.
>> >>> Probably...
>> >>>
>> >>> And then there's TFO. I note that in order to use it properly you need
>> >>> to turn it on in proc. Last I remember that was
>> >>>
>> >>> echo 3 > /proc/sys/net/ipv4/tcp_fastopen
>> >>
>> >>
>> >> Correct - to enable the normal use of TFO for both client and server.
>> >> There are other flags for advanced usage:
>> >>  /* Bit Flags for sysctl_tcp_fastopen */
>> >> #define TFO_CLIENT_ENABLE       1
>> >> #define TFO_SERVER_ENABLE       2
>> >> #define TFO_CLIENT_NO_COOKIE    4 /* Send data-in-SYN w/o cookie */
>> >>
>> >> /* Process SYN data but skip cookie validation */
>> >> #define TFO_SERVER_COOKIE_NOT_CHKED     0x100
>> >> /* Accept SYN data w/o any cookie option */
>> >> #define TFO_SERVER_COOKIE_NOT_REQD      0x200
>> >>
>> >> /* Force enable TFO on all listeners, i.e., not requiring the
>> >>  * TCP_FASTOPEN socket option. SOCKOPT1/2 determine how to set
>> max_qlen.
>> >>  */
>> >> #define TFO_SERVER_WO_SOCKOPT1  0x400
>> >> #define TFO_SERVER_WO_SOCKOPT2  0x800
>> >> /* Always create TFO child sockets on a TFO listener even when
>> >>  * cookie/data not present. (For testing purpose!)
>> >>  */
>> >> #define TFO_SERVER_ALWAYS       0x1000
>> >>
>> >>>
>> >>> However that's an old memory and there is this tcp_fastopen_key file I
>> >>> don't know anything about yet (this is such bleeding edge stuff!)
>> >>>
>> >>> ... and with tcp_fastopen disabled things should still work right...
>> >>> so I'm thinking something else is busted in the stack.
>> >>>
>> >>> I've also observed a dns slowdown in what I've been testing but hadn't
>> >>> dug into packet dumps. (and was assuming, until now, it was due to me
>> >>> fiddling with ULAs inside the network) Thanks for digging this deep!
>> >>>
>> >>> I never said this first attempt at 3.7 for cero was going to be
>> >>> perfect, but we've entered a new age of subtle problems here.
>> >>>
>> >>> I strongly suggest nobody else try this dev build as a default gw, and
>> >>> that the TFO folk ignore the noise for now.
>> >>
>> >>
>> >> SG.
>> >>
>> >> Jerry
>> >>
>> >>>
>> >>>
>> >>> I just got a 3.7.1 box built on x86_64 so as to a/b some captures.
>> >>> Regrettably I'm short on time through the weekend...
>> >>>
>> >>> On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak <
>> maciej@soltysiak.com>
>> >>> wrote:
>> >>> > I am seeing something strange here, with polipo related to TFO but
>> also
>> >>> > DNS.
>> >>> > When I just took 3.7.1-1 and set my windows 7 laptop to use
>> >>> > gw.home.lan:8123
>> >>> > as http proxy it didn't work. What I observed was:
>> >>> > A) after quite a while polipo's response to browser was 504 Host
>> >>> > www.osnews.com lookup failed: Timeout
>> >>> > b) this error in ssh console: Host osnews.com lookup failed:
>> Timeout
>> >>> > (131072)
>> >>> > c) Disabling TFO by adding option useTCPFastOpen 'false' to config
>> >>> > 'polipo'
>> >>> > 'general' works around the problem
>> >>> > d) Alternatively, you can keep TFO enabled in polipo but change
>> option
>> >>> > 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
>> >>> > This is very weird, because TFO is TCP and the DNS queries fired
>> off by
>> >>> > polipo are UDP:
>> >>> > root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
>> >>> > 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF],
>> >>> > proto
>> >>> > UDP (17), length 60)
>> >>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!]
>> >>> > 55396+ A?
>> >>> > www.osnews.com. (32)
>> >>> > 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x.....
>> >>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>> >>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>> >>> > 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
>> >>> > 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF],
>> >>> > proto
>> >>> > UDP (17), length 60)
>> >>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!]
>> >>> > 55396+
>> >>> > AAAA? www.osnews.com. (32)
>> >>> > 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x.....
>> >>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>> >>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>> >>> > 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
>> >>> > 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
>> proto
>> >>> > UDP
>> >>> > (17), length 123)
>> >>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!]
>> 55396
>> >>> > q:
>> >>> > A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
>> >>> > osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
>> >>> > ns1.swelter.net. (95)
>> >>> > 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@.@.<p....
>> >>> > 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
>> >>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>> >>> > 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
>> >>> > 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
>> >>> > 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
>> >>> > 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
>> >>> > 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
>> >>> > 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
>> proto
>> >>> > UDP
>> >>> > (17), length 135)
>> >>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!]
>> 55396
>> >>> > q:
>> >>> > AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
>> >>> > 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
>> >>> > osnews.com. [29m3s] NS ns2.swelter.net. (107)
>> >>> > 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@.@.<d....
>> >>> > 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
>> >>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>> >>> > 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
>> >>> > 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
>> >>> > 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
>> >>> > 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
>> >>> > 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
>> >>> > 0x0080: 0603 6e73 32c0 4c ..ns2.L
>> >>> > This is the only DNS traffic I saw during the attempts. The tcpdumps
>> >>> > have
>> >>> > udp bad checksum but when I disabled TFO in polipo, the UDP where
>> still
>> >>> > bad
>> >>> > checksum but they worked.
>> >>> > Really weird.
>> >>> > p.s. UPNP still works for port forwarding negotiation as it did in
>> >>> > 3.6.11-4
>> >>> > I still couldn't get the UPNP/SSDP broadcasts (udp to
>> 239.255.255.250)
>> >>> > to
>> >>> > being forwarded between se00 and sw00/sw10. Last time it worked was
>> >>> > ~3.3.8.
>> >>> > I'm starting not to question why it doesn't work, I'm starting to
>> >>> > wonder why
>> >>> > it did work then ;-)
>> >>> > Regards,
>> >>> > Maciej
>> >>> > On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com>
>> wrote:
>> >>> >>
>> >>> >> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com>
>> >>> >> wrote:
>> >>> >> > Sorry, could you give us a copy of the panic stack trace ?
>> >>> >>
>> >>> >> I will get a serial console up on a wndr3800 by sunday. (sorry,
>> just
>> >>> >> landed in california, am in disarray)
>> >>> >>
>> >>> >> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
>> >>> >>
>> >>> >> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>> >>> >>
>> >>> >> --
>> >>> >> Dave Täht
>> >>> >>
>> >>> >> Fixing bufferbloat with cerowrt:
>> >>> >> http://www.teklibre.com/cerowrt/subscribe.html
>> >>> >> _______________________________________________
>> >>> >> Cerowrt-devel mailing list
>> >>> >> Cerowrt-devel@lists.bufferbloat.net
>> >>> >> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Dave Täht
>> >>>
>> >>> Fixing bufferbloat with cerowrt:
>> >>> http://www.teklibre.com/cerowrt/subscribe.html
>> >>
>> >>
>> >
>>
>

[-- Attachment #2: Type: text/html, Size: 15312 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-05  3:02               ` Ketan Kulkarni
  2013-01-05  3:16                 ` Eric Dumazet
@ 2013-01-05  3:35                 ` Dave Taht
  2013-01-05  4:05                   ` Dave Taht
  2013-01-05 19:13                 ` Ketan Kulkarni
  2 siblings, 1 reply; 37+ messages in thread
From: Dave Taht @ 2013-01-05  3:35 UTC (permalink / raw)
  To: Ketan Kulkarni; +Cc: Jerry Chu, Eric Dumazet, Yuchung Cheng, cerowrt-devel

It's rather fun to explore a new protocol on a friday night!, but this
thread is getting out of hand. I created a bug for it here:

https://www.bufferbloat.net/issues/418

I don't mind if we continue to discuss it here, but do put packet
captures on the bug, please...

I scripted up a few tests that hopefully duplicate what ketan was
trying to do. (ketan?) and put up packet captures of the succesful
tests to an x86_64 box I had handy....

I will put a sacrificial cerowrt box back up with a serial port on it
before sunday....

I noted in the bug above (with more detail)

On a x86_64 path over a 4 hop wifi mesh network, httpping using polipo
as a proxy not only "worked", but roughly halved the time taken by a
~40kbyte http GET.

Typical httpping result, polipo, no TFO

connected to 172.26.3.4:80 (274 bytes), seq=4 time=10.76 ms

Typical result, polipo, useTCPFastopen = true in it's config and
/proc/sys/net/ipv4/tcp_fastopen = 3 on both sides....

connected to 172.26.3.4:80 (274 bytes), seq=4 time=6.61 ms

Impressive!

If the security issues with the idea are resolved (I'm aware of the
prior effort to do this but haven't thought deeply about how TFO
attempts to resolve those issues), and TFO can be deployed, it will be
a win for a wide range of latency sensitive tcp traffic types.


-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-05  3:35                 ` Dave Taht
@ 2013-01-05  4:05                   ` Dave Taht
  0 siblings, 0 replies; 37+ messages in thread
From: Dave Taht @ 2013-01-05  4:05 UTC (permalink / raw)
  To: Ketan Kulkarni; +Cc: Jerry Chu, Eric Dumazet, Yuchung Cheng, cerowrt-devel

On Fri, Jan 4, 2013 at 7:35 PM, Dave Taht <dave.taht@gmail.com> wrote:
> It's rather fun to explore a new protocol on a friday night!, but this
> thread is getting out of hand. I created a bug for it here:
>
> https://www.bufferbloat.net/issues/418
>
> I don't mind if we continue to discuss it here, but do put packet
> captures on the bug, please...
>
> I scripted up a few tests that hopefully duplicate what ketan was
> trying to do. (ketan?) and put up packet captures of the succesful
> tests to an x86_64 box I had handy....
>
> I will put a sacrificial cerowrt box back up with a serial port on it
> before sunday....
>
> I noted in the bug above (with more detail)
>
> On a x86_64 path over a 4 hop wifi mesh network, httpping using polipo
> as a proxy not only "worked", but roughly halved the time taken by a
> ~40kbyte http GET.
>
> Typical httpping result, polipo, no TFO
>
> connected to 172.26.3.4:80 (274 bytes), seq=4 time=10.76 ms
>
> Typical result, polipo, useTCPFastopen = true in it's config and
> /proc/sys/net/ipv4/tcp_fastopen = 3 on both sides....
>
> connected to 172.26.3.4:80 (274 bytes), seq=4 time=6.61 ms
>
> Impressive!
>
> If the security issues with the idea are resolved (I'm aware of the
> prior effort to do this but haven't thought deeply about how TFO
> attempts to resolve those issues), and TFO can be deployed, it will be
> a win for a wide range of latency sensitive tcp traffic types.

Spoke WAY too soon on the performance front, looks like we get a RST
from one of the (3.6.11) middleboxes in that testbed before we get the
whole file.

put up a new capture on

https://www.bufferbloat.net/issues/418

Calling it a night.

>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-05  3:02               ` Ketan Kulkarni
  2013-01-05  3:16                 ` Eric Dumazet
  2013-01-05  3:35                 ` Dave Taht
@ 2013-01-05 19:13                 ` Ketan Kulkarni
  2013-01-13 17:01                   ` Ketan Kulkarni
  2 siblings, 1 reply; 37+ messages in thread
From: Ketan Kulkarni @ 2013-01-05 19:13 UTC (permalink / raw)
  To: Yuchung Cheng; +Cc: Jerry Chu, Eric Dumazet, cerowrt-devel

Disabling ECN on cero box has no effect.
The box crashed with with ECN disabled.
Also tried enabling ECN on x86 and it didnt crash in either case. The
tcpdump on cero lo is updated at -
https://www.bufferbloat.net/issues/418#change-1703
It is exactly similar to the previously attached "lo_capture.txt" but
with ECN disabled.

I might try getting serial cable on Sunday to get the crash details.
Till then probably I cannot provide the crash logs as logread/dmesg
does not print anything.

Thanks,
Ketan

On Sat, Jan 5, 2013 at 8:32 AM, Ketan Kulkarni <ketkulka@gmail.com> wrote:
> Without TFO all worked fine.
> The problem is when tfo server is on cero box.
> I will try both ECN on on laptop and disabling ECN on cero with TFO on. Will
> report the behavior seen.
>
> Thanks,
> Ketan.
>
> On Jan 5, 2013 7:50 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
>>
>> On Fri, Jan 4, 2013 at 5:59 PM, Ketan Kulkarni <ketkulka@gmail.com> wrote:
>> > Well, I was trying polipo server on cero box and httping from laptop. On
>> > both the boxes I set 3 in tcp_fastopen.
>> >
>> > The panic is seen only when server is on cero box.
>> > If I run server on my laptop and httping from cero all TFO connections
>> > are
>> > successful.
>> > So I doubt its the only problem is SYN+DATA.
>> Just to confirm: you meant the problem is SYN/data processing on the
>> server side?
>>
>> Maybe we hit some ECN / TFO bug. Some crash log would be great. Thanks
>> for trying TFO!
>>
>> >
>> > Unfortunately I don't have the serial cable right now, and logread or
>> > dmesg
>> > didn't print any logs before the cero router  restarted.
>> >
>> > Attached is the tcpdump capture on lo when client and server both run on
>> > cero box.
>> > HTH!
>> >
>> > If you (or anyone) can suggest more diagnostics, I will be glad to
>> > provide.
>> >
>> > On Jan 5, 2013 2:49 AM, "Jerry Chu" <hkchu@google.com> wrote:
>> >>
>> >> +ycheng
>> >>
>> >>
>> >> On Fri, Jan 4, 2013 at 1:11 PM, Dave Taht <dave.taht@gmail.com> wrote:
>> >>>
>> >>> Hmm. I would lean towards there being an issue with the new (freshly
>> >>> ported forward to 3.7.1) unaligned checksum code for mips based on
>> >>> what you say here. Or an offload...
>> >>>
>> >>> As for the 239.x multicast issue, hmm... separate issue entirely.
>> >>> Probably...
>> >>>
>> >>> And then there's TFO. I note that in order to use it properly you need
>> >>> to turn it on in proc. Last I remember that was
>> >>>
>> >>> echo 3 > /proc/sys/net/ipv4/tcp_fastopen
>> >>
>> >>
>> >> Correct - to enable the normal use of TFO for both client and server.
>> >> There are other flags for advanced usage:
>> >>  /* Bit Flags for sysctl_tcp_fastopen */
>> >> #define TFO_CLIENT_ENABLE       1
>> >> #define TFO_SERVER_ENABLE       2
>> >> #define TFO_CLIENT_NO_COOKIE    4 /* Send data-in-SYN w/o cookie */
>> >>
>> >> /* Process SYN data but skip cookie validation */
>> >> #define TFO_SERVER_COOKIE_NOT_CHKED     0x100
>> >> /* Accept SYN data w/o any cookie option */
>> >> #define TFO_SERVER_COOKIE_NOT_REQD      0x200
>> >>
>> >> /* Force enable TFO on all listeners, i.e., not requiring the
>> >>  * TCP_FASTOPEN socket option. SOCKOPT1/2 determine how to set
>> >> max_qlen.
>> >>  */
>> >> #define TFO_SERVER_WO_SOCKOPT1  0x400
>> >> #define TFO_SERVER_WO_SOCKOPT2  0x800
>> >> /* Always create TFO child sockets on a TFO listener even when
>> >>  * cookie/data not present. (For testing purpose!)
>> >>  */
>> >> #define TFO_SERVER_ALWAYS       0x1000
>> >>
>> >>>
>> >>> However that's an old memory and there is this tcp_fastopen_key file I
>> >>> don't know anything about yet (this is such bleeding edge stuff!)
>> >>>
>> >>> ... and with tcp_fastopen disabled things should still work right...
>> >>> so I'm thinking something else is busted in the stack.
>> >>>
>> >>> I've also observed a dns slowdown in what I've been testing but hadn't
>> >>> dug into packet dumps. (and was assuming, until now, it was due to me
>> >>> fiddling with ULAs inside the network) Thanks for digging this deep!
>> >>>
>> >>> I never said this first attempt at 3.7 for cero was going to be
>> >>> perfect, but we've entered a new age of subtle problems here.
>> >>>
>> >>> I strongly suggest nobody else try this dev build as a default gw, and
>> >>> that the TFO folk ignore the noise for now.
>> >>
>> >>
>> >> SG.
>> >>
>> >> Jerry
>> >>
>> >>>
>> >>>
>> >>> I just got a 3.7.1 box built on x86_64 so as to a/b some captures.
>> >>> Regrettably I'm short on time through the weekend...
>> >>>
>> >>> On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak
>> >>> <maciej@soltysiak.com>
>> >>> wrote:
>> >>> > I am seeing something strange here, with polipo related to TFO but
>> >>> > also
>> >>> > DNS.
>> >>> > When I just took 3.7.1-1 and set my windows 7 laptop to use
>> >>> > gw.home.lan:8123
>> >>> > as http proxy it didn't work. What I observed was:
>> >>> > A) after quite a while polipo's response to browser was 504 Host
>> >>> > www.osnews.com lookup failed: Timeout
>> >>> > b) this error in ssh console: Host osnews.com lookup failed: Timeout
>> >>> > (131072)
>> >>> > c) Disabling TFO by adding option useTCPFastOpen 'false' to config
>> >>> > 'polipo'
>> >>> > 'general' works around the problem
>> >>> > d) Alternatively, you can keep TFO enabled in polipo but change
>> >>> > option
>> >>> > 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
>> >>> > This is very weird, because TFO is TCP and the DNS queries fired off
>> >>> > by
>> >>> > polipo are UDP:
>> >>> > root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
>> >>> > 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF],
>> >>> > proto
>> >>> > UDP (17), length 60)
>> >>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!]
>> >>> > 55396+ A?
>> >>> > www.osnews.com. (32)
>> >>> > 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x.....
>> >>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>> >>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>> >>> > 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
>> >>> > 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF],
>> >>> > proto
>> >>> > UDP (17), length 60)
>> >>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!]
>> >>> > 55396+
>> >>> > AAAA? www.osnews.com. (32)
>> >>> > 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x.....
>> >>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>> >>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>> >>> > 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
>> >>> > 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
>> >>> > proto
>> >>> > UDP
>> >>> > (17), length 123)
>> >>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!]
>> >>> > 55396
>> >>> > q:
>> >>> > A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
>> >>> > osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
>> >>> > ns1.swelter.net. (95)
>> >>> > 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@.@.<p....
>> >>> > 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
>> >>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>> >>> > 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
>> >>> > 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
>> >>> > 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
>> >>> > 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
>> >>> > 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
>> >>> > 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
>> >>> > proto
>> >>> > UDP
>> >>> > (17), length 135)
>> >>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!]
>> >>> > 55396
>> >>> > q:
>> >>> > AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
>> >>> > 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
>> >>> > osnews.com. [29m3s] NS ns2.swelter.net. (107)
>> >>> > 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@.@.<d....
>> >>> > 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
>> >>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>> >>> > 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
>> >>> > 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
>> >>> > 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
>> >>> > 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
>> >>> > 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
>> >>> > 0x0080: 0603 6e73 32c0 4c ..ns2.L
>> >>> > This is the only DNS traffic I saw during the attempts. The tcpdumps
>> >>> > have
>> >>> > udp bad checksum but when I disabled TFO in polipo, the UDP where
>> >>> > still
>> >>> > bad
>> >>> > checksum but they worked.
>> >>> > Really weird.
>> >>> > p.s. UPNP still works for port forwarding negotiation as it did in
>> >>> > 3.6.11-4
>> >>> > I still couldn't get the UPNP/SSDP broadcasts (udp to
>> >>> > 239.255.255.250)
>> >>> > to
>> >>> > being forwarded between se00 and sw00/sw10. Last time it worked was
>> >>> > ~3.3.8.
>> >>> > I'm starting not to question why it doesn't work, I'm starting to
>> >>> > wonder why
>> >>> > it did work then ;-)
>> >>> > Regards,
>> >>> > Maciej
>> >>> > On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com>
>> >>> >> wrote:
>> >>> >> > Sorry, could you give us a copy of the panic stack trace ?
>> >>> >>
>> >>> >> I will get a serial console up on a wndr3800 by sunday. (sorry,
>> >>> >> just
>> >>> >> landed in california, am in disarray)
>> >>> >>
>> >>> >> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
>> >>> >>
>> >>> >> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>> >>> >>
>> >>> >> --
>> >>> >> Dave Täht
>> >>> >>
>> >>> >> Fixing bufferbloat with cerowrt:
>> >>> >> http://www.teklibre.com/cerowrt/subscribe.html
>> >>> >> _______________________________________________
>> >>> >> Cerowrt-devel mailing list
>> >>> >> Cerowrt-devel@lists.bufferbloat.net
>> >>> >> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Dave Täht
>> >>>
>> >>> Fixing bufferbloat with cerowrt:
>> >>> http://www.teklibre.com/cerowrt/subscribe.html
>> >>
>> >>
>> >

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-05 19:13                 ` Ketan Kulkarni
@ 2013-01-13 17:01                   ` Ketan Kulkarni
  2013-01-13 18:03                     ` Eric Dumazet
  0 siblings, 1 reply; 37+ messages in thread
From: Ketan Kulkarni @ 2013-01-13 17:01 UTC (permalink / raw)
  To: Yuchung Cheng; +Cc: Jerry Chu, Eric Dumazet, cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 13466 bytes --]

I could get a chance to get the backtrace from serial port. I didnt do the
kgdb session yet.
To iterate, the crash occurs on TFO server on mips platform.

The call trace looks like this
[ 1024.530000] Call Trace: [ 1024.530000] [<801fc7f4>]
reqsk_fastopen_remove+0x30/0x17c [ 1024.530000] [<8024a36c>]
tcp_rcv_state_process+0x7b4/0xc28 [ 1024.530000] [<802516ec>]
tcp_v4_do_rcv+0x21c/0x274 [ 1024.530000] [<80253c74>]
tcp_v4_rcv+0x5b4/0x974 [ 1024.530000] [<802320f0>]
ip_local_deliver_finish+0x168/0x29c [ 1024.530000] [<80207100>]
__netif_receive_skb+0x63c/0x6c0 [ 1024.530000] [<c060b2e8>]
ieee80211_deliver_skb+0x1b8/0x220 [mac80211] [ 1024.530000] [<c060cc70>]
ieee80211_rx_handlers.part.12+0x1654/0x23e0 [mac80211] [ 1024.530000]
[<c060e468>] ieee80211_prepare_and_rx_handle+0xa6c/0xaf0 [mac80211] [
1024.530000] [<c060ecfc>] ieee80211_rx+0x810/0x8d8 [mac80211] [
1024.530000] [<c078651c>] ath_rx_tasklet+0xf4c/0x10a4 [ath9k] [
1024.530000] [<c078437c>] ath9k_tasklet+0x104/0x174 [ath9k] [ 1024.530000]
[<800793b8>] tasklet_action+0x78/0xc8 [ 1024.530000] [<80078c08>]
__do_softirq+0xb0/0x184 [ 1024.530000] [<80078d8c>] do_softirq+0x48/0x68 [
1024.530000] [<80078fa8>] irq_exit+0x4c/0x7c [ 1024.530000] [<8006330c>]
ret_from_irq+0x0/0x4 [ 1024.530000] [ 1024.530000] Code: 8e510208 30d300ff
2c420001 <00028036> 0c01e2a7 ac80048c 8e220008 2442ffff ae220008 [
1024.940000] ---[ end trace a47ff22dd20a96c1 ]---[ 1024.950000] Kernel
panic - not syncing: Fatal exception in interrupt

I suspect this is the line responsible for this crash

void reqsk_fastopen_remove(struct sock *sk, struct request_sock *req, bool
reset) { struct sock *lsk = tcp_rsk(req)->listener; struct fastopen_queue
*fastopenq = inet_csk(lsk)->icsk_accept_ queue.fastopenq;

>>>>> BUG_ON(!spin_is_locked(&sk-> sk_lock.slock) &&
!sock_owned_by_user(sk));

tcp_sk(sk)->fastopen_rsk = NULL; spin_lock_bh(&fastopenq->lock);
fastopenq->qlen--; tcp_rsk(req)->listener = NULL;

Please see more details here
http://www.bufferbloat.net/issues/418#change-1706

Thanks,
Ketan

On Jan 6, 2013 12:43 AM, "Ketan Kulkarni" <ketkulka@gmail.com> wrote:
>
> Disabling ECN on cero box has no effect.
> The box crashed with with ECN disabled.
> Also tried enabling ECN on x86 and it didnt crash in either case. The
> tcpdump on cero lo is updated at -
> https://www.bufferbloat.net/issues/418#change-1703
> It is exactly similar to the previously attached "lo_capture.txt" but
> with ECN disabled.
>
> I might try getting serial cable on Sunday to get the crash details.
> Till then probably I cannot provide the crash logs as logread/dmesg
> does not print anything.
>
> Thanks,
> Ketan
>
> On Sat, Jan 5, 2013 at 8:32 AM, Ketan Kulkarni <ketkulka@gmail.com> wrote:
> > Without TFO all worked fine.
> > The problem is when tfo server is on cero box.
> > I will try both ECN on on laptop and disabling ECN on cero with TFO on.
Will
> > report the behavior seen.
> >
> > Thanks,
> > Ketan.
> >
> > On Jan 5, 2013 7:50 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
> >>
> >> On Fri, Jan 4, 2013 at 5:59 PM, Ketan Kulkarni <ketkulka@gmail.com>
wrote:
> >> > Well, I was trying polipo server on cero box and httping from
laptop. On
> >> > both the boxes I set 3 in tcp_fastopen.
> >> >
> >> > The panic is seen only when server is on cero box.
> >> > If I run server on my laptop and httping from cero all TFO
connections
> >> > are
> >> > successful.
> >> > So I doubt its the only problem is SYN+DATA.
> >> Just to confirm: you meant the problem is SYN/data processing on the
> >> server side?
> >>
> >> Maybe we hit some ECN / TFO bug. Some crash log would be great. Thanks
> >> for trying TFO!
> >>
> >> >
> >> > Unfortunately I don't have the serial cable right now, and logread or
> >> > dmesg
> >> > didn't print any logs before the cero router  restarted.
> >> >
> >> > Attached is the tcpdump capture on lo when client and server both
run on
> >> > cero box.
> >> > HTH!
> >> >
> >> > If you (or anyone) can suggest more diagnostics, I will be glad to
> >> > provide.
> >> >
> >> > On Jan 5, 2013 2:49 AM, "Jerry Chu" <hkchu@google.com> wrote:
> >> >>
> >> >> +ycheng
> >> >>
> >> >>
> >> >> On Fri, Jan 4, 2013 at 1:11 PM, Dave Taht <dave.taht@gmail.com>
wrote:
> >> >>>
> >> >>> Hmm. I would lean towards there being an issue with the new
(freshly
> >> >>> ported forward to 3.7.1) unaligned checksum code for mips based on
> >> >>> what you say here. Or an offload...
> >> >>>
> >> >>> As for the 239.x multicast issue, hmm... separate issue entirely.
> >> >>> Probably...
> >> >>>
> >> >>> And then there's TFO. I note that in order to use it properly you
need
> >> >>> to turn it on in proc. Last I remember that was
> >> >>>
> >> >>> echo 3 > /proc/sys/net/ipv4/tcp_fastopen
> >> >>
> >> >>
> >> >> Correct - to enable the normal use of TFO for both client and
server.
> >> >> There are other flags for advanced usage:
> >> >>  /* Bit Flags for sysctl_tcp_fastopen */
> >> >> #define TFO_CLIENT_ENABLE       1
> >> >> #define TFO_SERVER_ENABLE       2
> >> >> #define TFO_CLIENT_NO_COOKIE    4 /* Send data-in-SYN w/o cookie */
> >> >>
> >> >> /* Process SYN data but skip cookie validation */
> >> >> #define TFO_SERVER_COOKIE_NOT_CHKED     0x100
> >> >> /* Accept SYN data w/o any cookie option */
> >> >> #define TFO_SERVER_COOKIE_NOT_REQD      0x200
> >> >>
> >> >> /* Force enable TFO on all listeners, i.e., not requiring the
> >> >>  * TCP_FASTOPEN socket option. SOCKOPT1/2 determine how to set
> >> >> max_qlen.
> >> >>  */
> >> >> #define TFO_SERVER_WO_SOCKOPT1  0x400
> >> >> #define TFO_SERVER_WO_SOCKOPT2  0x800
> >> >> /* Always create TFO child sockets on a TFO listener even when
> >> >>  * cookie/data not present. (For testing purpose!)
> >> >>  */
> >> >> #define TFO_SERVER_ALWAYS       0x1000
> >> >>
> >> >>>
> >> >>> However that's an old memory and there is this tcp_fastopen_key
file I
> >> >>> don't know anything about yet (this is such bleeding edge stuff!)
> >> >>>
> >> >>> ... and with tcp_fastopen disabled things should still work
right...
> >> >>> so I'm thinking something else is busted in the stack.
> >> >>>
> >> >>> I've also observed a dns slowdown in what I've been testing but
hadn't
> >> >>> dug into packet dumps. (and was assuming, until now, it was due to
me
> >> >>> fiddling with ULAs inside the network) Thanks for digging this
deep!
> >> >>>
> >> >>> I never said this first attempt at 3.7 for cero was going to be
> >> >>> perfect, but we've entered a new age of subtle problems here.
> >> >>>
> >> >>> I strongly suggest nobody else try this dev build as a default gw,
and
> >> >>> that the TFO folk ignore the noise for now.
> >> >>
> >> >>
> >> >> SG.
> >> >>
> >> >> Jerry
> >> >>
> >> >>>
> >> >>>
> >> >>> I just got a 3.7.1 box built on x86_64 so as to a/b some captures.
> >> >>> Regrettably I'm short on time through the weekend...
> >> >>>
> >> >>> On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak
> >> >>> <maciej@soltysiak.com>
> >> >>> wrote:
> >> >>> > I am seeing something strange here, with polipo related to TFO
but
> >> >>> > also
> >> >>> > DNS.
> >> >>> > When I just took 3.7.1-1 and set my windows 7 laptop to use
> >> >>> > gw.home.lan:8123
> >> >>> > as http proxy it didn't work. What I observed was:
> >> >>> > A) after quite a while polipo's response to browser was 504 Host
> >> >>> > www.osnews.com lookup failed: Timeout
> >> >>> > b) this error in ssh console: Host osnews.com lookup failed:
Timeout
> >> >>> > (131072)
> >> >>> > c) Disabling TFO by adding option useTCPFastOpen 'false' to
config
> >> >>> > 'polipo'
> >> >>> > 'general' works around the problem
> >> >>> > d) Alternatively, you can keep TFO enabled in polipo but change
> >> >>> > option
> >> >>> > 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
> >> >>> > This is very weird, because TFO is TCP and the DNS queries fired
off
> >> >>> > by
> >> >>> > polipo are UDP:
> >> >>> > root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
> >> >>> > 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags
[DF],
> >> >>> > proto
> >> >>> > UDP (17), length 60)
> >> >>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!]
> >> >>> > 55396+ A?
> >> >>> > www.osnews.com. (32)
> >> >>> > 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x.....
> >> >>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
> >> >>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
> >> >>> > 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
> >> >>> > 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags
[DF],
> >> >>> > proto
> >> >>> > UDP (17), length 60)
> >> >>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!]
> >> >>> > 55396+
> >> >>> > AAAA? www.osnews.com. (32)
> >> >>> > 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x.....
> >> >>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
> >> >>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
> >> >>> > 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
> >> >>> > 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
> >> >>> > proto
> >> >>> > UDP
> >> >>> > (17), length 123)
> >> >>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!]
> >> >>> > 55396
> >> >>> > q:
> >> >>> > A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159
ns:
> >> >>> > osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
> >> >>> > ns1.swelter.net. (95)
> >> >>> > 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@.@.<p....
> >> >>> > 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
> >> >>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
> >> >>> > 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
> >> >>> > 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
> >> >>> > 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
> >> >>> > 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
> >> >>> > 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
> >> >>> > 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
> >> >>> > proto
> >> >>> > UDP
> >> >>> > (17), length 135)
> >> >>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!]
> >> >>> > 55396
> >> >>> > q:
> >> >>> > AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
> >> >>> > 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
> >> >>> > osnews.com. [29m3s] NS ns2.swelter.net. (107)
> >> >>> > 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@.@.<d....
> >> >>> > 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
> >> >>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
> >> >>> > 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
> >> >>> > 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
> >> >>> > 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
> >> >>> > 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
> >> >>> > 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
> >> >>> > 0x0080: 0603 6e73 32c0 4c ..ns2.L
> >> >>> > This is the only DNS traffic I saw during the attempts. The
tcpdumps
> >> >>> > have
> >> >>> > udp bad checksum but when I disabled TFO in polipo, the UDP where
> >> >>> > still
> >> >>> > bad
> >> >>> > checksum but they worked.
> >> >>> > Really weird.
> >> >>> > p.s. UPNP still works for port forwarding negotiation as it did
in
> >> >>> > 3.6.11-4
> >> >>> > I still couldn't get the UPNP/SSDP broadcasts (udp to
> >> >>> > 239.255.255.250)
> >> >>> > to
> >> >>> > being forwarded between se00 and sw00/sw10. Last time it worked
was
> >> >>> > ~3.3.8.
> >> >>> > I'm starting not to question why it doesn't work, I'm starting to
> >> >>> > wonder why
> >> >>> > it did work then ;-)
> >> >>> > Regards,
> >> >>> > Maciej
> >> >>> > On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com>
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <
edumazet@google.com>
> >> >>> >> wrote:
> >> >>> >> > Sorry, could you give us a copy of the panic stack trace ?
> >> >>> >>
> >> >>> >> I will get a serial console up on a wndr3800 by sunday. (sorry,
> >> >>> >> just
> >> >>> >> landed in california, am in disarray)
> >> >>> >>
> >> >>> >> The latest dev build of cero for the wndr3800 and wndr3700v2 is
at:
> >> >>> >>
> >> >>> >> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
> >> >>> >>
> >> >>> >> --
> >> >>> >> Dave Täht
> >> >>> >>
> >> >>> >> Fixing bufferbloat with cerowrt:
> >> >>> >> http://www.teklibre.com/cerowrt/subscribe.html
> >> >>> >> _______________________________________________
> >> >>> >> Cerowrt-devel mailing list
> >> >>> >> Cerowrt-devel@lists.bufferbloat.net
> >> >>> >> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> >> >>> >
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Dave Täht
> >> >>>
> >> >>> Fixing bufferbloat with cerowrt:
> >> >>> http://www.teklibre.com/cerowrt/subscribe.html
> >> >>
> >> >>
> >> >

[-- Attachment #2: Type: text/html, Size: 20751 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-13 17:01                   ` Ketan Kulkarni
@ 2013-01-13 18:03                     ` Eric Dumazet
  2013-01-13 21:39                       ` Felix Fietkau
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Dumazet @ 2013-01-13 18:03 UTC (permalink / raw)
  To: Ketan Kulkarni; +Cc: Jerry Chu, Yuchung Cheng, cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 14368 bytes --]

I suspect a bug in the spin_is_locked() implementation on your arch, as he
socket lock should be held at this point.



On Sun, Jan 13, 2013 at 9:01 AM, Ketan Kulkarni <ketkulka@gmail.com> wrote:

> I could get a chance to get the backtrace from serial port. I didnt do the
> kgdb session yet.
> To iterate, the crash occurs on TFO server on mips platform.
>
> The call trace looks like this
> [ 1024.530000] Call Trace: [ 1024.530000] [<801fc7f4>]
> reqsk_fastopen_remove+0x30/0x17c [ 1024.530000] [<8024a36c>]
> tcp_rcv_state_process+0x7b4/0xc28 [ 1024.530000] [<802516ec>]
> tcp_v4_do_rcv+0x21c/0x274 [ 1024.530000] [<80253c74>]
> tcp_v4_rcv+0x5b4/0x974 [ 1024.530000] [<802320f0>]
> ip_local_deliver_finish+0x168/0x29c [ 1024.530000] [<80207100>]
> __netif_receive_skb+0x63c/0x6c0 [ 1024.530000] [<c060b2e8>]
> ieee80211_deliver_skb+0x1b8/0x220 [mac80211] [ 1024.530000] [<c060cc70>]
> ieee80211_rx_handlers.part.12+0x1654/0x23e0 [mac80211] [ 1024.530000]
> [<c060e468>] ieee80211_prepare_and_rx_handle+0xa6c/0xaf0 [mac80211] [
> 1024.530000] [<c060ecfc>] ieee80211_rx+0x810/0x8d8 [mac80211] [
> 1024.530000] [<c078651c>] ath_rx_tasklet+0xf4c/0x10a4 [ath9k] [
> 1024.530000] [<c078437c>] ath9k_tasklet+0x104/0x174 [ath9k] [ 1024.530000]
> [<800793b8>] tasklet_action+0x78/0xc8 [ 1024.530000] [<80078c08>]
> __do_softirq+0xb0/0x184 [ 1024.530000] [<80078d8c>] do_softirq+0x48/0x68 [
> 1024.530000] [<80078fa8>] irq_exit+0x4c/0x7c [ 1024.530000] [<8006330c>]
> ret_from_irq+0x0/0x4 [ 1024.530000] [ 1024.530000] Code: 8e510208 30d300ff
> 2c420001 <00028036> 0c01e2a7 ac80048c 8e220008 2442ffff ae220008 [
> 1024.940000] ---[ end trace a47ff22dd20a96c1 ]---[ 1024.950000] Kernel
> panic - not syncing: Fatal exception in interrupt
>
> I suspect this is the line responsible for this crash
>
> void reqsk_fastopen_remove(struct sock *sk, struct request_sock *req, bool
> reset) { struct sock *lsk = tcp_rsk(req)->listener; struct fastopen_queue
> *fastopenq = inet_csk(lsk)->icsk_accept_ queue.fastopenq;
>
> >>>>> BUG_ON(!spin_is_locked(&sk-> sk_lock.slock) &&
> !sock_owned_by_user(sk));
>
> tcp_sk(sk)->fastopen_rsk = NULL; spin_lock_bh(&fastopenq->lock);
> fastopenq->qlen--; tcp_rsk(req)->listener = NULL;
>
> Please see more details here
> http://www.bufferbloat.net/issues/418#change-1706
>
> Thanks,
> Ketan
>
> On Jan 6, 2013 12:43 AM, "Ketan Kulkarni" <ketkulka@gmail.com> wrote:
> >
> > Disabling ECN on cero box has no effect.
> > The box crashed with with ECN disabled.
> > Also tried enabling ECN on x86 and it didnt crash in either case. The
> > tcpdump on cero lo is updated at -
> > https://www.bufferbloat.net/issues/418#change-1703
> > It is exactly similar to the previously attached "lo_capture.txt" but
> > with ECN disabled.
> >
> > I might try getting serial cable on Sunday to get the crash details.
> > Till then probably I cannot provide the crash logs as logread/dmesg
> > does not print anything.
> >
> > Thanks,
> > Ketan
> >
> > On Sat, Jan 5, 2013 at 8:32 AM, Ketan Kulkarni <ketkulka@gmail.com>
> wrote:
> > > Without TFO all worked fine.
> > > The problem is when tfo server is on cero box.
> > > I will try both ECN on on laptop and disabling ECN on cero with TFO
> on. Will
> > > report the behavior seen.
> > >
> > > Thanks,
> > > Ketan.
> > >
> > > On Jan 5, 2013 7:50 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
> > >>
> > >> On Fri, Jan 4, 2013 at 5:59 PM, Ketan Kulkarni <ketkulka@gmail.com>
> wrote:
> > >> > Well, I was trying polipo server on cero box and httping from
> laptop. On
> > >> > both the boxes I set 3 in tcp_fastopen.
> > >> >
> > >> > The panic is seen only when server is on cero box.
> > >> > If I run server on my laptop and httping from cero all TFO
> connections
> > >> > are
> > >> > successful.
> > >> > So I doubt its the only problem is SYN+DATA.
> > >> Just to confirm: you meant the problem is SYN/data processing on the
> > >> server side?
> > >>
> > >> Maybe we hit some ECN / TFO bug. Some crash log would be great. Thanks
> > >> for trying TFO!
> > >>
> > >> >
> > >> > Unfortunately I don't have the serial cable right now, and logread
> or
> > >> > dmesg
> > >> > didn't print any logs before the cero router  restarted.
> > >> >
> > >> > Attached is the tcpdump capture on lo when client and server both
> run on
> > >> > cero box.
> > >> > HTH!
> > >> >
> > >> > If you (or anyone) can suggest more diagnostics, I will be glad to
> > >> > provide.
> > >> >
> > >> > On Jan 5, 2013 2:49 AM, "Jerry Chu" <hkchu@google.com> wrote:
> > >> >>
> > >> >> +ycheng
> > >> >>
> > >> >>
> > >> >> On Fri, Jan 4, 2013 at 1:11 PM, Dave Taht <dave.taht@gmail.com>
> wrote:
> > >> >>>
> > >> >>> Hmm. I would lean towards there being an issue with the new
> (freshly
> > >> >>> ported forward to 3.7.1) unaligned checksum code for mips based on
> > >> >>> what you say here. Or an offload...
> > >> >>>
> > >> >>> As for the 239.x multicast issue, hmm... separate issue entirely.
> > >> >>> Probably...
> > >> >>>
> > >> >>> And then there's TFO. I note that in order to use it properly you
> need
> > >> >>> to turn it on in proc. Last I remember that was
> > >> >>>
> > >> >>> echo 3 > /proc/sys/net/ipv4/tcp_fastopen
> > >> >>
> > >> >>
> > >> >> Correct - to enable the normal use of TFO for both client and
> server.
> > >> >> There are other flags for advanced usage:
> > >> >>  /* Bit Flags for sysctl_tcp_fastopen */
> > >> >> #define TFO_CLIENT_ENABLE       1
> > >> >> #define TFO_SERVER_ENABLE       2
> > >> >> #define TFO_CLIENT_NO_COOKIE    4 /* Send data-in-SYN w/o cookie */
> > >> >>
> > >> >> /* Process SYN data but skip cookie validation */
> > >> >> #define TFO_SERVER_COOKIE_NOT_CHKED     0x100
> > >> >> /* Accept SYN data w/o any cookie option */
> > >> >> #define TFO_SERVER_COOKIE_NOT_REQD      0x200
> > >> >>
> > >> >> /* Force enable TFO on all listeners, i.e., not requiring the
> > >> >>  * TCP_FASTOPEN socket option. SOCKOPT1/2 determine how to set
> > >> >> max_qlen.
> > >> >>  */
> > >> >> #define TFO_SERVER_WO_SOCKOPT1  0x400
> > >> >> #define TFO_SERVER_WO_SOCKOPT2  0x800
> > >> >> /* Always create TFO child sockets on a TFO listener even when
> > >> >>  * cookie/data not present. (For testing purpose!)
> > >> >>  */
> > >> >> #define TFO_SERVER_ALWAYS       0x1000
> > >> >>
> > >> >>>
> > >> >>> However that's an old memory and there is this tcp_fastopen_key
> file I
> > >> >>> don't know anything about yet (this is such bleeding edge stuff!)
> > >> >>>
> > >> >>> ... and with tcp_fastopen disabled things should still work
> right...
> > >> >>> so I'm thinking something else is busted in the stack.
> > >> >>>
> > >> >>> I've also observed a dns slowdown in what I've been testing but
> hadn't
> > >> >>> dug into packet dumps. (and was assuming, until now, it was due
> to me
> > >> >>> fiddling with ULAs inside the network) Thanks for digging this
> deep!
> > >> >>>
> > >> >>> I never said this first attempt at 3.7 for cero was going to be
> > >> >>> perfect, but we've entered a new age of subtle problems here.
> > >> >>>
> > >> >>> I strongly suggest nobody else try this dev build as a default
> gw, and
> > >> >>> that the TFO folk ignore the noise for now.
> > >> >>
> > >> >>
> > >> >> SG.
> > >> >>
> > >> >> Jerry
> > >> >>
> > >> >>>
> > >> >>>
> > >> >>> I just got a 3.7.1 box built on x86_64 so as to a/b some captures.
> > >> >>> Regrettably I'm short on time through the weekend...
> > >> >>>
> > >> >>> On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak
> > >> >>> <maciej@soltysiak.com>
> > >> >>> wrote:
> > >> >>> > I am seeing something strange here, with polipo related to TFO
> but
> > >> >>> > also
> > >> >>> > DNS.
> > >> >>> > When I just took 3.7.1-1 and set my windows 7 laptop to use
> > >> >>> > gw.home.lan:8123
> > >> >>> > as http proxy it didn't work. What I observed was:
> > >> >>> > A) after quite a while polipo's response to browser was 504 Host
> > >> >>> > www.osnews.com lookup failed: Timeout
> > >> >>> > b) this error in ssh console: Host osnews.com lookup failed:
> Timeout
> > >> >>> > (131072)
> > >> >>> > c) Disabling TFO by adding option useTCPFastOpen 'false' to
> config
> > >> >>> > 'polipo'
> > >> >>> > 'general' works around the problem
> > >> >>> > d) Alternatively, you can keep TFO enabled in polipo but change
> > >> >>> > option
> > >> >>> > 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
> > >> >>> > This is very weird, because TFO is TCP and the DNS queries
> fired off
> > >> >>> > by
> > >> >>> > polipo are UDP:
> > >> >>> > root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i
> lo
> > >> >>> > 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags
> [DF],
> > >> >>> > proto
> > >> >>> > UDP (17), length 60)
> > >> >>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b ->
> 0xd17f!]
> > >> >>> > 55396+ A?
> > >> >>> > www.osnews.com. (32)
> > >> >>> > 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x.....
> > >> >>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
> > >> >>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
> > >> >>> > 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
> > >> >>> > 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags
> [DF],
> > >> >>> > proto
> > >> >>> > UDP (17), length 60)
> > >> >>> > 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b ->
> 0xd164!]
> > >> >>> > 55396+
> > >> >>> > AAAA? www.osnews.com. (32)
> > >> >>> > 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x.....
> > >> >>> > 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
> > >> >>> > 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
> > >> >>> > 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
> > >> >>> > 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
> > >> >>> > proto
> > >> >>> > UDP
> > >> >>> > (17), length 123)
> > >> >>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a ->
> 0x5f73!]
> > >> >>> > 55396
> > >> >>> > q:
> > >> >>> > A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A
> 74.86.31.159 ns:
> > >> >>> > osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
> > >> >>> > ns1.swelter.net. (95)
> > >> >>> > 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@
> .@.<p....
> > >> >>> > 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
> > >> >>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
> > >> >>> > 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
> > >> >>> > 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
> > >> >>> > 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
> > >> >>> > 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
> > >> >>> > 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
> > >> >>> > 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
> > >> >>> > proto
> > >> >>> > UDP
> > >> >>> > (17), length 135)
> > >> >>> > 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 ->
> 0x8ecb!]
> > >> >>> > 55396
> > >> >>> > q:
> > >> >>> > AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
> > >> >>> > 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net
> .,
> > >> >>> > osnews.com. [29m3s] NS ns2.swelter.net. (107)
> > >> >>> > 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@
> .@.<d....
> > >> >>> > 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
> > >> >>> > 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
> > >> >>> > 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
> > >> >>> > 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
> > >> >>> > 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
> > >> >>> > 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
> > >> >>> > 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
> > >> >>> > 0x0080: 0603 6e73 32c0 4c ..ns2.L
> > >> >>> > This is the only DNS traffic I saw during the attempts. The
> tcpdumps
> > >> >>> > have
> > >> >>> > udp bad checksum but when I disabled TFO in polipo, the UDP
> where
> > >> >>> > still
> > >> >>> > bad
> > >> >>> > checksum but they worked.
> > >> >>> > Really weird.
> > >> >>> > p.s. UPNP still works for port forwarding negotiation as it did
> in
> > >> >>> > 3.6.11-4
> > >> >>> > I still couldn't get the UPNP/SSDP broadcasts (udp to
> > >> >>> > 239.255.255.250)
> > >> >>> > to
> > >> >>> > being forwarded between se00 and sw00/sw10. Last time it worked
> was
> > >> >>> > ~3.3.8.
> > >> >>> > I'm starting not to question why it doesn't work, I'm starting
> to
> > >> >>> > wonder why
> > >> >>> > it did work then ;-)
> > >> >>> > Regards,
> > >> >>> > Maciej
> > >> >>> > On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com>
> > >> >>> > wrote:
> > >> >>> >>
> > >> >>> >> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <
> edumazet@google.com>
> > >> >>> >> wrote:
> > >> >>> >> > Sorry, could you give us a copy of the panic stack trace ?
> > >> >>> >>
> > >> >>> >> I will get a serial console up on a wndr3800 by sunday. (sorry,
> > >> >>> >> just
> > >> >>> >> landed in california, am in disarray)
> > >> >>> >>
> > >> >>> >> The latest dev build of cero for the wndr3800 and wndr3700v2
> is at:
> > >> >>> >>
> > >> >>> >> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
> > >> >>> >>
> > >> >>> >> --
> > >> >>> >> Dave Täht
> > >> >>> >>
> > >> >>> >> Fixing bufferbloat with cerowrt:
> > >> >>> >> http://www.teklibre.com/cerowrt/subscribe.html
> > >> >>> >> _______________________________________________
> > >> >>> >> Cerowrt-devel mailing list
> > >> >>> >> Cerowrt-devel@lists.bufferbloat.net
> > >> >>> >> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> > >> >>> >
> > >> >>> >
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>> --
> > >> >>> Dave Täht
> > >> >>>
> > >> >>> Fixing bufferbloat with cerowrt:
> > >> >>> http://www.teklibre.com/cerowrt/subscribe.html
> > >> >>
> > >> >>
> > >> >
>

[-- Attachment #2: Type: text/html, Size: 21893 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-13 18:03                     ` Eric Dumazet
@ 2013-01-13 21:39                       ` Felix Fietkau
  2013-01-14  0:38                         ` Yuchung Cheng
  2013-01-14  3:05                         ` Eric Dumazet
  0 siblings, 2 replies; 37+ messages in thread
From: Felix Fietkau @ 2013-01-13 21:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jerry Chu, cerowrt-devel, Yuchung Cheng

On 2013-01-13 7:03 PM, Eric Dumazet wrote:
> I suspect a bug in the spin_is_locked() implementation on your arch, as
> he socket lock should be held at this point.
I don't think this is an arch implementation bug, this probably happens
on all !SMP systems. See this bit from include/linux/spinlock_up.h:

#define arch_spin_is_locked(lock)   ((void)(lock), 0)

- Felix


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-13 21:39                       ` Felix Fietkau
@ 2013-01-14  0:38                         ` Yuchung Cheng
  2013-01-14  3:05                         ` Eric Dumazet
  1 sibling, 0 replies; 37+ messages in thread
From: Yuchung Cheng @ 2013-01-14  0:38 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: Jerry Chu, Eric Dumazet, cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 676 bytes --]

Thanks for making these efforts to debug this. Ketan: can we try replace
the one BUG_ON with two WARN_ON  to confirm the exact faulty condition? I
wish I can do that myself but I don't have a box at hand.

Yuchung

On Sun, Jan 13, 2013 at 1:39 PM, Felix Fietkau <nbd@openwrt.org> wrote:

> On 2013-01-13 7:03 PM, Eric Dumazet wrote:
> > I suspect a bug in the spin_is_locked() implementation on your arch, as
> > he socket lock should be held at this point.
> I don't think this is an arch implementation bug, this probably happens
> on all !SMP systems. See this bit from include/linux/spinlock_up.h:
>
> #define arch_spin_is_locked(lock)   ((void)(lock), 0)
>
> - Felix
>
>

[-- Attachment #2: Type: text/html, Size: 1081 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-13 21:39                       ` Felix Fietkau
  2013-01-14  0:38                         ` Yuchung Cheng
@ 2013-01-14  3:05                         ` Eric Dumazet
  2013-01-14  4:07                           ` Eric Dumazet
  2013-01-14  8:18                           ` Jerry Chu
  1 sibling, 2 replies; 37+ messages in thread
From: Eric Dumazet @ 2013-01-14  3:05 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: Jerry Chu, cerowrt-devel, Yuchung Cheng

[-- Attachment #1: Type: text/plain, Size: 1816 bytes --]

Oh well yes, this doesnt quite work on !SMP.

And this kind of bug is frequent....

See following example :

commit b9980cdcf2524c5fe15d8cbae9c97b3ed6385563
Author: Hugh Dickins <hughd@google.com>
Date:   Wed Feb 8 17:13:40 2012 -0800

    mm: fix UP THP spin_is_locked BUGs

    Fix CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_SMP=n CONFIG_DEBUG_VM=y
    CONFIG_DEBUG_SPINLOCK=n kernel: spin_is_locked() is then always false,
    and so triggers some BUGs in Transparent HugePage codepaths.

    asm-generic/bug.h mentions this problem, and provides a WARN_ON_SMP(x);
    but being too lazy to add VM_BUG_ON_SMP, BUG_ON_SMP, WARN_ON_SMP_ONCE,
    VM_WARN_ON_SMP_ONCE, just test NR_CPUS != 1 in the existing VM_BUG_ONs.

    Signed-off-by: Hugh Dickins <hughd@google.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b3ffc21..91d3efb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2083,7 +2083,7 @@ static void collect_mm_slot(struct mm_slot *mm_slot)
 {
        struct mm_struct *mm = mm_slot->mm;

-       VM_BUG_ON(!spin_is_locked(&khugepaged_mm_lock));
+       VM_BUG_ON(NR_CPUS != 1 && !spin_is_locked(&khugepaged_mm_lock));




On Sun, Jan 13, 2013 at 1:39 PM, Felix Fietkau <nbd@openwrt.org> wrote:

> On 2013-01-13 7:03 PM, Eric Dumazet wrote:
> > I suspect a bug in the spin_is_locked() implementation on your arch, as
> > he socket lock should be held at this point.
> I don't think this is an arch implementation bug, this probably happens
> on all !SMP systems. See this bit from include/linux/spinlock_up.h:
>
> #define arch_spin_is_locked(lock)   ((void)(lock), 0)
>
> - Felix
>
>

[-- Attachment #2: Type: text/html, Size: 3886 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-14  3:05                         ` Eric Dumazet
@ 2013-01-14  4:07                           ` Eric Dumazet
  2013-01-14  4:43                             ` Ketan Kulkarni
  2013-01-14  8:18                           ` Jerry Chu
  1 sibling, 1 reply; 37+ messages in thread
From: Eric Dumazet @ 2013-01-14  4:07 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: Jerry Chu, cerowrt-devel, Yuchung Cheng

[-- Attachment #1: Type: text/plain, Size: 2559 bytes --]

Quite frankly I would just remove the BUG_ON()

diff --git a/net/core/request_sock.c b/net/core/request_sock.c
index c31d9e8..4425148 100644
--- a/net/core/request_sock.c
+++ b/net/core/request_sock.c
@@ -186,8 +186,6 @@ void reqsk_fastopen_remove(struct sock *sk, struct
request_sock *req,
        struct fastopen_queue *fastopenq =
            inet_csk(lsk)->icsk_accept_queue.fastopenq;

-       BUG_ON(!spin_is_locked(&sk->sk_lock.slock) &&
!sock_owned_by_user(sk));
-
        tcp_sk(sk)->fastopen_rsk = NULL;
        spin_lock_bh(&fastopenq->lock);
        fastopenq->qlen--;



On Sun, Jan 13, 2013 at 7:05 PM, Eric Dumazet <edumazet@google.com> wrote:

> Oh well yes, this doesnt quite work on !SMP.
>
> And this kind of bug is frequent....
>
> See following example :
>
> commit b9980cdcf2524c5fe15d8cbae9c97b3ed6385563
> Author: Hugh Dickins <hughd@google.com>
> Date:   Wed Feb 8 17:13:40 2012 -0800
>
>     mm: fix UP THP spin_is_locked BUGs
>
>     Fix CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_SMP=n CONFIG_DEBUG_VM=y
>     CONFIG_DEBUG_SPINLOCK=n kernel: spin_is_locked() is then always false,
>     and so triggers some BUGs in Transparent HugePage codepaths.
>
>     asm-generic/bug.h mentions this problem, and provides a WARN_ON_SMP(x);
>     but being too lazy to add VM_BUG_ON_SMP, BUG_ON_SMP, WARN_ON_SMP_ONCE,
>     VM_WARN_ON_SMP_ONCE, just test NR_CPUS != 1 in the existing VM_BUG_ONs.
>
>     Signed-off-by: Hugh Dickins <hughd@google.com>
>     Cc: Andrea Arcangeli <aarcange@redhat.com>
>     Cc: <stable@vger.kernel.org>
>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index b3ffc21..91d3efb 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2083,7 +2083,7 @@ static void collect_mm_slot(struct mm_slot *mm_slot)
>  {
>         struct mm_struct *mm = mm_slot->mm;
>
> -       VM_BUG_ON(!spin_is_locked(&khugepaged_mm_lock));
> +       VM_BUG_ON(NR_CPUS != 1 && !spin_is_locked(&khugepaged_mm_lock));
>
>
>
>
> On Sun, Jan 13, 2013 at 1:39 PM, Felix Fietkau <nbd@openwrt.org> wrote:
>
>> On 2013-01-13 7:03 PM, Eric Dumazet wrote:
>> > I suspect a bug in the spin_is_locked() implementation on your arch, as
>> > he socket lock should be held at this point.
>> I don't think this is an arch implementation bug, this probably happens
>> on all !SMP systems. See this bit from include/linux/spinlock_up.h:
>>
>> #define arch_spin_is_locked(lock)   ((void)(lock), 0)
>>
>> - Felix
>>
>>
>

[-- Attachment #2: Type: text/html, Size: 5536 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-14  4:07                           ` Eric Dumazet
@ 2013-01-14  4:43                             ` Ketan Kulkarni
  2013-01-14  6:14                               ` Dave Taht
  0 siblings, 1 reply; 37+ messages in thread
From: Ketan Kulkarni @ 2013-01-14  4:43 UTC (permalink / raw)
  To: Eric Dumazet, Yuchung Cheng; +Cc: cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 2861 bytes --]

Thanks Eric and Yuchung for taking care of the patch. I will test few more
TFO cases as well once this patch is built in cero.

Thanks,
Ketan

On Jan 14, 2013 9:37 AM, "Eric Dumazet" <edumazet@google.com> wrote:
>
> Quite frankly I would just remove the BUG_ON()
>
> diff --git a/net/core/request_sock.c b/net/core/request_sock.c
> index c31d9e8..4425148 100644
> --- a/net/core/request_sock.c
> +++ b/net/core/request_sock.c
> @@ -186,8 +186,6 @@ void reqsk_fastopen_remove(struct sock *sk, struct
request_sock *req,
>         struct fastopen_queue *fastopenq =
>             inet_csk(lsk)->icsk_accept_queue.fastopenq;
>
> -       BUG_ON(!spin_is_locked(&sk->sk_lock.slock) &&
!sock_owned_by_user(sk));
> -
>         tcp_sk(sk)->fastopen_rsk = NULL;
>         spin_lock_bh(&fastopenq->lock);
>         fastopenq->qlen--;
>
>
>
> On Sun, Jan 13, 2013 at 7:05 PM, Eric Dumazet <edumazet@google.com> wrote:
>>
>> Oh well yes, this doesnt quite work on !SMP.
>>
>> And this kind of bug is frequent....
>>
>> See following example :
>>
>> commit b9980cdcf2524c5fe15d8cbae9c97b3ed6385563
>> Author: Hugh Dickins <hughd@google.com>
>> Date:   Wed Feb 8 17:13:40 2012 -0800
>>
>>     mm: fix UP THP spin_is_locked BUGs
>>
>>     Fix CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_SMP=n CONFIG_DEBUG_VM=y
>>     CONFIG_DEBUG_SPINLOCK=n kernel: spin_is_locked() is then always
false,
>>     and so triggers some BUGs in Transparent HugePage codepaths.
>>
>>     asm-generic/bug.h mentions this problem, and provides a
WARN_ON_SMP(x);
>>     but being too lazy to add VM_BUG_ON_SMP, BUG_ON_SMP,
WARN_ON_SMP_ONCE,
>>     VM_WARN_ON_SMP_ONCE, just test NR_CPUS != 1 in the existing
VM_BUG_ONs.
>>
>>     Signed-off-by: Hugh Dickins <hughd@google.com>
>>     Cc: Andrea Arcangeli <aarcange@redhat.com>
>>     Cc: <stable@vger.kernel.org>
>>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index b3ffc21..91d3efb 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2083,7 +2083,7 @@ static void collect_mm_slot(struct mm_slot
*mm_slot)
>>  {
>>         struct mm_struct *mm = mm_slot->mm;
>>
>> -       VM_BUG_ON(!spin_is_locked(&khugepaged_mm_lock));
>> +       VM_BUG_ON(NR_CPUS != 1 && !spin_is_locked(&khugepaged_mm_lock));
>>
>>
>>
>>
>> On Sun, Jan 13, 2013 at 1:39 PM, Felix Fietkau <nbd@openwrt.org> wrote:
>>>
>>> On 2013-01-13 7:03 PM, Eric Dumazet wrote:
>>> > I suspect a bug in the spin_is_locked() implementation on your arch,
as
>>> > he socket lock should be held at this point.
>>> I don't think this is an arch implementation bug, this probably happens
>>> on all !SMP systems. See this bit from include/linux/spinlock_up.h:
>>>
>>> #define arch_spin_is_locked(lock)   ((void)(lock), 0)
>>>
>>> - Felix
>>>
>>
>

[-- Attachment #2: Type: text/html, Size: 4210 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-04 20:42     ` Maciej Soltysiak
                         ` (2 preceding siblings ...)
  2013-01-04 22:25       ` Robert Bradley
@ 2013-01-14  6:11       ` Dave Taht
  2013-01-14 16:37         ` Ketan Kulkarni
  2013-01-16 22:19         ` Maciej Soltysiak
  3 siblings, 2 replies; 37+ messages in thread
From: Dave Taht @ 2013-01-14  6:11 UTC (permalink / raw)
  To: Maciej Soltysiak; +Cc: cerowrt-devel

This is a different issue that tfo, so taking the tfo-ers off the list

On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak <maciej@soltysiak.com> wrote:
> I am seeing something strange here, with polipo related to TFO but also DNS.

I have had polipo's internal dns resolver mess up on multiple occasions
exactly along the lines you describe. There is a bug for it in the
cerowrt database as best as I recall.

I have never tracked down why it happens.

> When I just took 3.7.1-1 and set my windows 7 laptop to use gw.home.lan:8123
> as http proxy it didn't work. What I observed was:
> A) after quite a while polipo's response to browser was 504 Host
> www.osnews.com lookup failed: Timeout
> b) this error in ssh console: Host osnews.com lookup failed: Timeout
> (131072)
> c) Disabling TFO by adding option useTCPFastOpen 'false' to config 'polipo'
> 'general' works around the problem
> d) Alternatively, you can keep TFO enabled in polipo but change option
> 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
> This is very weird, because TFO is TCP and the DNS queries fired off by
> polipo are UDP:
> root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
> 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF], proto

No, it's not weird, there's something about uclibc and polipo interacting here
that is kind of unknown. It has always seemed to me to be maybe a bug
in polipo's internal dns resolver on mips...

> UDP (17), length 60)
> 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!] 55396+ A?

The bad checksum issue probably doesn't matter.

However an actual tcpdump capture file would be useful to have to look
at the format of the dns query.

> www.osnews.com. (32)

> 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x.....
> 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
> 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
> 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
> 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF], proto
> UDP (17), length 60)
> 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!] 55396+
> AAAA? www.osnews.com. (32)
> 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x.....
> 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
> 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
> 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
> 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
> (17), length 123)
> 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!] 55396 q:
> A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
> osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
> ns1.swelter.net. (95)
> 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@.@.<p....
> 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
> 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
> 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
> 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
> 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
> 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
> 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
> 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
> (17), length 135)
> 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!] 55396 q:
> AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
> 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
> osnews.com. [29m3s] NS ns2.swelter.net. (107)
> 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@.@.<d....
> 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
> 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
> 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
> 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
> 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
> 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
> 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
> 0x0080: 0603 6e73 32c0 4c ..ns2.L
> This is the only DNS traffic I saw during the attempts. The tcpdumps have
> udp bad checksum but when I disabled TFO in polipo, the UDP where still bad
> checksum but they worked.

I hesitate to draw a connection between TFO and the DNS failures. What
I would see was polipo would work for a while, then start failing on
DNS traffic, and like you my workaround was to use gethostbyname
(which unfortunately clobbers performance).

As fond as I am of split tcp solutions I never poked into this further
at the time....

It's probably a really simple off-by-one error in the dns code inside
polipo. Perhaps a packet capture will get us closer. Is there an
active mailing list for it?


> Really weird.
> p.s. UPNP still works for port forwarding negotiation as it did in 3.6.11-4
> I still couldn't get the UPNP/SSDP broadcasts (udp to 239.255.255.250) to
> being forwarded between se00 and sw00/sw10. Last time it worked was ~3.3.8.

OK, yet another issue.

The routing cache got eliminated between 3.3 and 3.6, and there were
all sorts of changes to it over the last 6 releases that have been
bothersome.

or perhaps I did something stupid regarding igmp. (is it even on?)

> I'm starting not to question why it doesn't work, I'm starting to wonder why
> it did work then ;-)
> Regards,
> Maciej
> On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com> wrote:
>>
>> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com> wrote:
>> > Sorry, could you give us a copy of the panic stack trace ?
>>
>> I will get a serial console up on a wndr3800 by sunday. (sorry, just
>> landed in california, am in disarray)
>>
>> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
>>
>> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>>
>> --
>> Dave Täht
>>
>> Fixing bufferbloat with cerowrt:
>> http://www.teklibre.com/cerowrt/subscribe.html
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-devel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
>



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-14  4:43                             ` Ketan Kulkarni
@ 2013-01-14  6:14                               ` Dave Taht
  2013-01-14 19:50                                 ` Dave Taht
  0 siblings, 1 reply; 37+ messages in thread
From: Dave Taht @ 2013-01-14  6:14 UTC (permalink / raw)
  To: Ketan Kulkarni; +Cc: Eric Dumazet, Yuchung Cheng, cerowrt-devel

I am so buried as to only be able to do new builds of cero once a week.

Can the bad behavior be duplicated on a single core other sort of
processor, like x86? Or merely boot up a x86 box in a single processor
mode?

I'll try to get a new release out next sunday.

On Sun, Jan 13, 2013 at 8:43 PM, Ketan Kulkarni <ketkulka@gmail.com> wrote:
> Thanks Eric and Yuchung for taking care of the patch. I will test few more
> TFO cases as well once this patch is built in cero.
>
> Thanks,
> Ketan
>
> On Jan 14, 2013 9:37 AM, "Eric Dumazet" <edumazet@google.com> wrote:
>>
>> Quite frankly I would just remove the BUG_ON()
>>
>> diff --git a/net/core/request_sock.c b/net/core/request_sock.c
>> index c31d9e8..4425148 100644
>> --- a/net/core/request_sock.c
>> +++ b/net/core/request_sock.c
>> @@ -186,8 +186,6 @@ void reqsk_fastopen_remove(struct sock *sk, struct
>> request_sock *req,
>>         struct fastopen_queue *fastopenq =
>>             inet_csk(lsk)->icsk_accept_queue.fastopenq;
>>
>> -       BUG_ON(!spin_is_locked(&sk->sk_lock.slock) &&
>> !sock_owned_by_user(sk));
>> -
>>         tcp_sk(sk)->fastopen_rsk = NULL;
>>         spin_lock_bh(&fastopenq->lock);
>>         fastopenq->qlen--;
>>
>>
>>
>> On Sun, Jan 13, 2013 at 7:05 PM, Eric Dumazet <edumazet@google.com> wrote:
>>>
>>> Oh well yes, this doesnt quite work on !SMP.
>>>
>>> And this kind of bug is frequent....
>>>
>>> See following example :
>>>
>>> commit b9980cdcf2524c5fe15d8cbae9c97b3ed6385563
>>> Author: Hugh Dickins <hughd@google.com>
>>> Date:   Wed Feb 8 17:13:40 2012 -0800
>>>
>>>     mm: fix UP THP spin_is_locked BUGs
>>>
>>>     Fix CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_SMP=n CONFIG_DEBUG_VM=y
>>>     CONFIG_DEBUG_SPINLOCK=n kernel: spin_is_locked() is then always
>>> false,
>>>     and so triggers some BUGs in Transparent HugePage codepaths.
>>>
>>>     asm-generic/bug.h mentions this problem, and provides a
>>> WARN_ON_SMP(x);
>>>     but being too lazy to add VM_BUG_ON_SMP, BUG_ON_SMP,
>>> WARN_ON_SMP_ONCE,
>>>     VM_WARN_ON_SMP_ONCE, just test NR_CPUS != 1 in the existing
>>> VM_BUG_ONs.
>>>
>>>     Signed-off-by: Hugh Dickins <hughd@google.com>
>>>     Cc: Andrea Arcangeli <aarcange@redhat.com>
>>>     Cc: <stable@vger.kernel.org>
>>>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index b3ffc21..91d3efb 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -2083,7 +2083,7 @@ static void collect_mm_slot(struct mm_slot
>>> *mm_slot)
>>>  {
>>>         struct mm_struct *mm = mm_slot->mm;
>>>
>>> -       VM_BUG_ON(!spin_is_locked(&khugepaged_mm_lock));
>>> +       VM_BUG_ON(NR_CPUS != 1 && !spin_is_locked(&khugepaged_mm_lock));
>>>
>>>
>>>
>>>
>>> On Sun, Jan 13, 2013 at 1:39 PM, Felix Fietkau <nbd@openwrt.org> wrote:
>>>>
>>>> On 2013-01-13 7:03 PM, Eric Dumazet wrote:
>>>> > I suspect a bug in the spin_is_locked() implementation on your arch,
>>>> > as
>>>> > he socket lock should be held at this point.
>>>> I don't think this is an arch implementation bug, this probably happens
>>>> on all !SMP systems. See this bit from include/linux/spinlock_up.h:
>>>>
>>>> #define arch_spin_is_locked(lock)   ((void)(lock), 0)
>>>>
>>>> - Felix
>>>>
>>>
>>
>
>
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-14  3:05                         ` Eric Dumazet
  2013-01-14  4:07                           ` Eric Dumazet
@ 2013-01-14  8:18                           ` Jerry Chu
  2013-01-14 16:32                             ` Eric Dumazet
  1 sibling, 1 reply; 37+ messages in thread
From: Jerry Chu @ 2013-01-14  8:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: cerowrt-devel, Yuchung Cheng

[-- Attachment #1: Type: text/plain, Size: 2334 bytes --]

On Sun, Jan 13, 2013 at 7:05 PM, Eric Dumazet <edumazet@google.com> wrote:

> Oh well yes, this doesnt quite work on !SMP.
>

Strange - how would one assert a spin lock is held, and obviously only for
SMP? (I almost think arch_spin_is_locked(lock) should be ((void)(lock), 1)
for UP for the purpose of assertion...)

Also it looks like there are bunch of other places spin_is_locked()
assertion is made in the source tree. (Perhaps they are only configured for
MP?)

Thanks,

Jerry


> And this kind of bug is frequent....
>
> See following example :
>
> commit b9980cdcf2524c5fe15d8cbae9c97b3ed6385563
> Author: Hugh Dickins <hughd@google.com>
> Date:   Wed Feb 8 17:13:40 2012 -0800
>
>     mm: fix UP THP spin_is_locked BUGs
>
>     Fix CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_SMP=n CONFIG_DEBUG_VM=y
>     CONFIG_DEBUG_SPINLOCK=n kernel: spin_is_locked() is then always false,
>     and so triggers some BUGs in Transparent HugePage codepaths.
>
>     asm-generic/bug.h mentions this problem, and provides a WARN_ON_SMP(x);
>     but being too lazy to add VM_BUG_ON_SMP, BUG_ON_SMP, WARN_ON_SMP_ONCE,
>     VM_WARN_ON_SMP_ONCE, just test NR_CPUS != 1 in the existing VM_BUG_ONs.
>
>     Signed-off-by: Hugh Dickins <hughd@google.com>
>     Cc: Andrea Arcangeli <aarcange@redhat.com>
>     Cc: <stable@vger.kernel.org>
>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index b3ffc21..91d3efb 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2083,7 +2083,7 @@ static void collect_mm_slot(struct mm_slot *mm_slot)
>  {
>         struct mm_struct *mm = mm_slot->mm;
>
> -       VM_BUG_ON(!spin_is_locked(&khugepaged_mm_lock));
> +       VM_BUG_ON(NR_CPUS != 1 && !spin_is_locked(&khugepaged_mm_lock));
>
>
>
>
> On Sun, Jan 13, 2013 at 1:39 PM, Felix Fietkau <nbd@openwrt.org> wrote:
>
>> On 2013-01-13 7:03 PM, Eric Dumazet wrote:
>> > I suspect a bug in the spin_is_locked() implementation on your arch, as
>> > he socket lock should be held at this point.
>> I don't think this is an arch implementation bug, this probably happens
>> on all !SMP systems. See this bit from include/linux/spinlock_up.h:
>>
>> #define arch_spin_is_locked(lock)   ((void)(lock), 0)
>>
>> - Felix
>>
>>
>

[-- Attachment #2: Type: text/html, Size: 4423 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-14  8:18                           ` Jerry Chu
@ 2013-01-14 16:32                             ` Eric Dumazet
  0 siblings, 0 replies; 37+ messages in thread
From: Eric Dumazet @ 2013-01-14 16:32 UTC (permalink / raw)
  To: Jerry Chu; +Cc: cerowrt-devel, Yuchung Cheng

[-- Attachment #1: Type: text/plain, Size: 2667 bytes --]

Some paths want to check a spinlock is held, others want to check if its
not held, it depends on the context.

So returning 1 on UP would break a bunch of code as well.


On Mon, Jan 14, 2013 at 12:18 AM, Jerry Chu <hkchu@google.com> wrote:

>
>
> On Sun, Jan 13, 2013 at 7:05 PM, Eric Dumazet <edumazet@google.com> wrote:
>
>> Oh well yes, this doesnt quite work on !SMP.
>>
>
> Strange - how would one assert a spin lock is held, and obviously only for
> SMP? (I almost think arch_spin_is_locked(lock) should be ((void)(lock),
> 1) for UP for the purpose of assertion...)
>
> Also it looks like there are bunch of other places spin_is_locked()
> assertion is made in the source tree. (Perhaps they are only configured for
> MP?)
>
> Thanks,
>
> Jerry
>
>
>> And this kind of bug is frequent....
>>
>> See following example :
>>
>> commit b9980cdcf2524c5fe15d8cbae9c97b3ed6385563
>> Author: Hugh Dickins <hughd@google.com>
>> Date:   Wed Feb 8 17:13:40 2012 -0800
>>
>>     mm: fix UP THP spin_is_locked BUGs
>>
>>     Fix CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_SMP=n CONFIG_DEBUG_VM=y
>>     CONFIG_DEBUG_SPINLOCK=n kernel: spin_is_locked() is then always false,
>>     and so triggers some BUGs in Transparent HugePage codepaths.
>>
>>     asm-generic/bug.h mentions this problem, and provides a
>> WARN_ON_SMP(x);
>>     but being too lazy to add VM_BUG_ON_SMP, BUG_ON_SMP, WARN_ON_SMP_ONCE,
>>     VM_WARN_ON_SMP_ONCE, just test NR_CPUS != 1 in the existing
>> VM_BUG_ONs.
>>
>>     Signed-off-by: Hugh Dickins <hughd@google.com>
>>     Cc: Andrea Arcangeli <aarcange@redhat.com>
>>     Cc: <stable@vger.kernel.org>
>>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index b3ffc21..91d3efb 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2083,7 +2083,7 @@ static void collect_mm_slot(struct mm_slot *mm_slot)
>>  {
>>         struct mm_struct *mm = mm_slot->mm;
>>
>> -       VM_BUG_ON(!spin_is_locked(&khugepaged_mm_lock));
>> +       VM_BUG_ON(NR_CPUS != 1 && !spin_is_locked(&khugepaged_mm_lock));
>>
>>
>>
>>
>> On Sun, Jan 13, 2013 at 1:39 PM, Felix Fietkau <nbd@openwrt.org> wrote:
>>
>>> On 2013-01-13 7:03 PM, Eric Dumazet wrote:
>>> > I suspect a bug in the spin_is_locked() implementation on your arch, as
>>> > he socket lock should be held at this point.
>>> I don't think this is an arch implementation bug, this probably happens
>>> on all !SMP systems. See this bit from include/linux/spinlock_up.h:
>>>
>>> #define arch_spin_is_locked(lock)   ((void)(lock), 0)
>>>
>>> - Felix
>>>
>>>
>>
>

[-- Attachment #2: Type: text/html, Size: 5150 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-14  6:11       ` Dave Taht
@ 2013-01-14 16:37         ` Ketan Kulkarni
  2013-01-16 22:19         ` Maciej Soltysiak
  1 sibling, 0 replies; 37+ messages in thread
From: Ketan Kulkarni @ 2013-01-14 16:37 UTC (permalink / raw)
  To: Dave Taht, Maciej Soltysiak; +Cc: cerowrt-devel

I have never played around with polipo proxy much - neither did wonder
about its DNS behavior.
It would be good to have a bug filed and discussion tracked over there.

Maciej: can you please report a bug and put the logs (preferably
without TFO ;-) )?
I can take a look at those later this week probably.

Thanks,
Ketan

On Mon, Jan 14, 2013 at 11:41 AM, Dave Taht <dave.taht@gmail.com> wrote:
> This is a different issue that tfo, so taking the tfo-ers off the list
>
> On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak <maciej@soltysiak.com> wrote:
>> I am seeing something strange here, with polipo related to TFO but also DNS.
>
> I have had polipo's internal dns resolver mess up on multiple occasions
> exactly along the lines you describe. There is a bug for it in the
> cerowrt database as best as I recall.
>
> I have never tracked down why it happens.
>
>> When I just took 3.7.1-1 and set my windows 7 laptop to use gw.home.lan:8123
>> as http proxy it didn't work. What I observed was:
>> A) after quite a while polipo's response to browser was 504 Host
>> www.osnews.com lookup failed: Timeout
>> b) this error in ssh console: Host osnews.com lookup failed: Timeout
>> (131072)
>> c) Disabling TFO by adding option useTCPFastOpen 'false' to config 'polipo'
>> 'general' works around the problem
>> d) Alternatively, you can keep TFO enabled in polipo but change option
>> 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
>> This is very weird, because TFO is TCP and the DNS queries fired off by
>> polipo are UDP:
>> root@OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
>> 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF], proto
>
> No, it's not weird, there's something about uclibc and polipo interacting here
> that is kind of unknown. It has always seemed to me to be maybe a bug
> in polipo's internal dns resolver on mips...
>
>> UDP (17), length 60)
>> 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!] 55396+ A?
>
> The bad checksum issue probably doesn't matter.
>
> However an actual tcpdump capture file would be useful to have to look
> at the format of the dns query.
>
>> www.osnews.com. (32)
>
>> 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<..@.@.x.....
>> 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>> 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>> 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
>> 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF], proto
>> UDP (17), length 60)
>> 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!] 55396+
>> AAAA? www.osnews.com. (32)
>> 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<..@.@.x.....
>> 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>> 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>> 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
>> 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
>> (17), length 123)
>> 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!] 55396 q:
>> A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
>> osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
>> ns1.swelter.net. (95)
>> 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{..@.@.<p....
>> 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
>> 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>> 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
>> 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
>> 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
>> 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
>> 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
>> 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
>> (17), length 135)
>> 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!] 55396 q:
>> AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
>> 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
>> osnews.com. [29m3s] NS ns2.swelter.net. (107)
>> 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E.....@.@.<d....
>> 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
>> 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>> 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
>> 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
>> 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
>> 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
>> 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
>> 0x0080: 0603 6e73 32c0 4c ..ns2.L
>> This is the only DNS traffic I saw during the attempts. The tcpdumps have
>> udp bad checksum but when I disabled TFO in polipo, the UDP where still bad
>> checksum but they worked.
>
> I hesitate to draw a connection between TFO and the DNS failures. What
> I would see was polipo would work for a while, then start failing on
> DNS traffic, and like you my workaround was to use gethostbyname
> (which unfortunately clobbers performance).
>
> As fond as I am of split tcp solutions I never poked into this further
> at the time....
>
> It's probably a really simple off-by-one error in the dns code inside
> polipo. Perhaps a packet capture will get us closer. Is there an
> active mailing list for it?
>
>
>> Really weird.
>> p.s. UPNP still works for port forwarding negotiation as it did in 3.6.11-4
>> I still couldn't get the UPNP/SSDP broadcasts (udp to 239.255.255.250) to
>> being forwarded between se00 and sw00/sw10. Last time it worked was ~3.3.8.
>
> OK, yet another issue.
>
> The routing cache got eliminated between 3.3 and 3.6, and there were
> all sorts of changes to it over the last 6 releases that have been
> bothersome.
>
> or perhaps I did something stupid regarding igmp. (is it even on?)
>
>> I'm starting not to question why it doesn't work, I'm starting to wonder why
>> it did work then ;-)
>> Regards,
>> Maciej
>> On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com> wrote:
>>>
>>> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com> wrote:
>>> > Sorry, could you give us a copy of the panic stack trace ?
>>>
>>> I will get a serial console up on a wndr3800 by sunday. (sorry, just
>>> landed in california, am in disarray)
>>>
>>> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
>>>
>>> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>>>
>>> --
>>> Dave Täht
>>>
>>> Fixing bufferbloat with cerowrt:
>>> http://www.teklibre.com/cerowrt/subscribe.html
>>> _______________________________________________
>>> Cerowrt-devel mailing list
>>> Cerowrt-devel@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>
>>
>
>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-14  6:14                               ` Dave Taht
@ 2013-01-14 19:50                                 ` Dave Taht
  0 siblings, 0 replies; 37+ messages in thread
From: Dave Taht @ 2013-01-14 19:50 UTC (permalink / raw)
  To: Ketan Kulkarni; +Cc: Eric Dumazet, Yuchung Cheng, cerowrt-devel

On Mon, Jan 14, 2013 at 1:14 AM, Dave Taht <dave.taht@gmail.com> wrote:
> I am so buried as to only be able to do new builds of cero once a week.
>
> Can the bad behavior be duplicated on a single core other sort of
> processor, like x86? Or merely boot up a x86 box in a single processor
> mode?
>
> I'll try to get a new release out next sunday.

I lied. Crash bugs bother me a lot. A release of cerowrt with the BUG_ON removed
for TFO is now up at:

There are no other changes from Cerowrt-3.7.2-1.

Those playing with it should enable TFO in polipo as per this thread
and also fiddle
with various settings for the gethostbyname option in polipo.

I did not look into the presumably separate DNS lookup issue, nor the multicast
issue also mentioned on this thread.

a new, cleaned up version of the ar71xx unaligned access code arrived
in openwrt head (thx nbd!), which addresses  some new stuff and leaves
out some stuff in the existing cerowrt patch set for unaligned access,
notably a bunch of ipv6 stuff that inspired the patch in the first
place.

I retain concerns re the checksum code on both versions.

There were multiple other (mosty ipv6 related) changes to openwrt over
the weekend ...

which made risking a pull forward of that stuff into this quick
snapshot release of cero too risky to do, and I would prefer that the
two differing unaligned patches be merged cleanly and pushed up to
openWrt.

So hopefully the TFO portion of this bug thread is resolved, and there
are 3 other bugs left to look at separately...

>
> On Sun, Jan 13, 2013 at 8:43 PM, Ketan Kulkarni <ketkulka@gmail.com> wrote:
>> Thanks Eric and Yuchung for taking care of the patch. I will test few more
>> TFO cases as well once this patch is built in cero.
>>
>> Thanks,
>> Ketan
>>
>> On Jan 14, 2013 9:37 AM, "Eric Dumazet" <edumazet@google.com> wrote:
>>>
>>> Quite frankly I would just remove the BUG_ON()
>>>
>>> diff --git a/net/core/request_sock.c b/net/core/request_sock.c
>>> index c31d9e8..4425148 100644
>>> --- a/net/core/request_sock.c
>>> +++ b/net/core/request_sock.c
>>> @@ -186,8 +186,6 @@ void reqsk_fastopen_remove(struct sock *sk, struct
>>> request_sock *req,
>>>         struct fastopen_queue *fastopenq =
>>>             inet_csk(lsk)->icsk_accept_queue.fastopenq;
>>>
>>> -       BUG_ON(!spin_is_locked(&sk->sk_lock.slock) &&
>>> !sock_owned_by_user(sk));
>>> -
>>>         tcp_sk(sk)->fastopen_rsk = NULL;
>>>         spin_lock_bh(&fastopenq->lock);
>>>         fastopenq->qlen--;
>>>
>>>
>>>
>>> On Sun, Jan 13, 2013 at 7:05 PM, Eric Dumazet <edumazet@google.com> wrote:
>>>>
>>>> Oh well yes, this doesnt quite work on !SMP.
>>>>
>>>> And this kind of bug is frequent....
>>>>
>>>> See following example :
>>>>
>>>> commit b9980cdcf2524c5fe15d8cbae9c97b3ed6385563
>>>> Author: Hugh Dickins <hughd@google.com>
>>>> Date:   Wed Feb 8 17:13:40 2012 -0800
>>>>
>>>>     mm: fix UP THP spin_is_locked BUGs
>>>>
>>>>     Fix CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_SMP=n CONFIG_DEBUG_VM=y
>>>>     CONFIG_DEBUG_SPINLOCK=n kernel: spin_is_locked() is then always
>>>> false,
>>>>     and so triggers some BUGs in Transparent HugePage codepaths.
>>>>
>>>>     asm-generic/bug.h mentions this problem, and provides a
>>>> WARN_ON_SMP(x);
>>>>     but being too lazy to add VM_BUG_ON_SMP, BUG_ON_SMP,
>>>> WARN_ON_SMP_ONCE,
>>>>     VM_WARN_ON_SMP_ONCE, just test NR_CPUS != 1 in the existing
>>>> VM_BUG_ONs.
>>>>
>>>>     Signed-off-by: Hugh Dickins <hughd@google.com>
>>>>     Cc: Andrea Arcangeli <aarcange@redhat.com>
>>>>     Cc: <stable@vger.kernel.org>
>>>>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>>>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index b3ffc21..91d3efb 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -2083,7 +2083,7 @@ static void collect_mm_slot(struct mm_slot
>>>> *mm_slot)
>>>>  {
>>>>         struct mm_struct *mm = mm_slot->mm;
>>>>
>>>> -       VM_BUG_ON(!spin_is_locked(&khugepaged_mm_lock));
>>>> +       VM_BUG_ON(NR_CPUS != 1 && !spin_is_locked(&khugepaged_mm_lock));
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Jan 13, 2013 at 1:39 PM, Felix Fietkau <nbd@openwrt.org> wrote:
>>>>>
>>>>> On 2013-01-13 7:03 PM, Eric Dumazet wrote:
>>>>> > I suspect a bug in the spin_is_locked() implementation on your arch,
>>>>> > as
>>>>> > he socket lock should be held at this point.
>>>>> I don't think this is an arch implementation bug, this probably happens
>>>>> on all !SMP systems. See this bit from include/linux/spinlock_up.h:
>>>>>
>>>>> #define arch_spin_is_locked(lock)   ((void)(lock), 0)
>>>>>
>>>>> - Felix
>>>>>
>>>>
>>>
>>
>>
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-devel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>
>
>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-14  6:11       ` Dave Taht
  2013-01-14 16:37         ` Ketan Kulkarni
@ 2013-01-16 22:19         ` Maciej Soltysiak
  2013-01-17  0:58           ` Dave Taht
  2013-01-17  3:44           ` Dave Taht
  1 sibling, 2 replies; 37+ messages in thread
From: Maciej Soltysiak @ 2013-01-16 22:19 UTC (permalink / raw)
  To: Dave Taht; +Cc: cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 1902 bytes --]

On Mon, Jan 14, 2013 at 7:11 AM, Dave Taht <dave.taht@gmail.com> wrote:

> The routing cache got eliminated between 3.3 and 3.6, and there were
> all sorts of changes to it over the last 6 releases that have been
> bothersome.

Ok, you might be onto something. It got eliminated in 3.6.x; So I checked a
few things with a cero-3.7.2-3 here:

# ip maddr show | grep 224
shows nothing (only ipv6 addresses show up for maddr)

Trying to start pimd shows:
pimd: 23:16:52.675 Cannot set PIM flag in kernel:(error 99): Protocol not
available

There is no /proc/net/igmp, but only /proc/net/igmp6

Could this be Dave M's patch to remove routing cache or are we having
.CONFIG misconfig?

Regards,
Maciej


>


> or perhaps I did something stupid regarding igmp. (is it even on?)
>
> > I'm starting not to question why it doesn't work, I'm starting to wonder
> why
> > it did work then ;-)
> > Regards,
> > Maciej
> > On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com> wrote:
> >>
> >> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com>
> wrote:
> >> > Sorry, could you give us a copy of the panic stack trace ?
> >>
> >> I will get a serial console up on a wndr3800 by sunday. (sorry, just
> >> landed in california, am in disarray)
> >>
> >> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
> >>
> >> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
> >>
> >> --
> >> Dave Täht
> >>
> >> Fixing bufferbloat with cerowrt:
> >> http://www.teklibre.com/cerowrt/subscribe.html
> >> _______________________________________________
> >> Cerowrt-devel mailing list
> >> Cerowrt-devel@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> >
> >
>
>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt:
> http://www.teklibre.com/cerowrt/subscribe.html
>

[-- Attachment #2: Type: text/html, Size: 3461 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-16 22:19         ` Maciej Soltysiak
@ 2013-01-17  0:58           ` Dave Taht
  2013-01-17  3:44           ` Dave Taht
  1 sibling, 0 replies; 37+ messages in thread
From: Dave Taht @ 2013-01-17  0:58 UTC (permalink / raw)
  To: Maciej Soltysiak; +Cc: cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 2665 bytes --]

Sounds like a bingo to me.

On Wed, Jan 16, 2013 at 5:19 PM, Maciej Soltysiak <maciej@soltysiak.com>wrote:

> On Mon, Jan 14, 2013 at 7:11 AM, Dave Taht <dave.taht@gmail.com> wrote:
>
>> The routing cache got eliminated between 3.3 and 3.6, and there were
>> all sorts of changes to it over the last 6 releases that have been
>> bothersome.
>
> Ok, you might be onto something. It got eliminated in 3.6.x; So I checked
> a few things with a cero-3.7.2-3 here:
>
> # ip maddr show | grep 224
> shows nothing (only ipv6 addresses show up for maddr)
>
> Trying to start pimd shows:
> pimd: 23:16:52.675 Cannot set PIM flag in kernel:(error 99): Protocol not
> available
>
> There is no /proc/net/igmp, but only /proc/net/igmp6
>
> Could this be Dave M's patch to remove routing cache or are we having
> .CONFIG misconfig?
>

You get a gold star!

Somewhere in between 3.3.x and 3.7.x I managed to lose PIMV1 and PIMV2
support
in the kernel configuration, and also lost the NO_HZ and faster clock I'd
had there too.

CONFIG_NO_HZ=y
CONFIG_HZ=256
CONFIG_HZ_256=y

As well as the other TCP variants like VENO and LP (I think, I might have
added them elsewhere)

Checking into that latter bit as well as into the unaligned traps tonight
(unless someone beats me to it)


> Regards,
> Maciej
>
>
>>
>
>
>> or perhaps I did something stupid regarding igmp. (is it even on?)
>>
>> > I'm starting not to question why it doesn't work, I'm starting to
>> wonder why
>> > it did work then ;-)
>> > Regards,
>> > Maciej
>> > On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com> wrote:
>> >>
>> >> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com>
>> wrote:
>> >> > Sorry, could you give us a copy of the panic stack trace ?
>> >>
>> >> I will get a serial console up on a wndr3800 by sunday. (sorry, just
>> >> landed in california, am in disarray)
>> >>
>> >> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
>> >>
>> >> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>> >>
>> >> --
>> >> Dave Täht
>> >>
>> >> Fixing bufferbloat with cerowrt:
>> >> http://www.teklibre.com/cerowrt/subscribe.html
>> >> _______________________________________________
>> >> Cerowrt-devel mailing list
>> >> Cerowrt-devel@lists.bufferbloat.net
>> >> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>> >
>> >
>>
>>
>>
>> --
>> Dave Täht
>>
>> Fixing bufferbloat with cerowrt:
>> http://www.teklibre.com/cerowrt/subscribe.html
>>
>
>


-- 
Dave Täht

Fixing bufferbloat with cerowrt:
http://www.teklibre.com/cerowrt/subscribe.html

[-- Attachment #2: Type: text/html, Size: 4713 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1
  2013-01-16 22:19         ` Maciej Soltysiak
  2013-01-17  0:58           ` Dave Taht
@ 2013-01-17  3:44           ` Dave Taht
  1 sibling, 0 replies; 37+ messages in thread
From: Dave Taht @ 2013-01-17  3:44 UTC (permalink / raw)
  To: Maciej Soltysiak; +Cc: cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 2393 bytes --]

On Wed, Jan 16, 2013 at 5:19 PM, Maciej Soltysiak <maciej@soltysiak.com>wrote:

> On Mon, Jan 14, 2013 at 7:11 AM, Dave Taht <dave.taht@gmail.com> wrote:
>
>> The routing cache got eliminated between 3.3 and 3.6, and there were
>> all sorts of changes to it over the last 6 releases that have been
>> bothersome.
>
> Ok, you might be onto something. It got eliminated in 3.6.x; So I checked
> a few things with a cero-3.7.2-3 here:
>
> # ip maddr show | grep 224
> shows nothing (only ipv6 addresses show up for maddr)
>
> Trying to start pimd shows:
> pimd: 23:16:52.675 Cannot set PIM flag in kernel:(error 99): Protocol not
> available
>

Fixed that.


> There is no /proc/net/igmp, but only /proc/net/igmp6
>

I'm not sure what should be shown nowadays for 224.X.

I mean, mdns-scan works. But cerowrt shows no igmp
and an x86 box does.

uftpd doesn't work but hasn't ever worked.

fiddling...


>
> Could this be Dave M's patch to remove routing cache or are we having
> .CONFIG misconfig?
>
> Regards,
> Maciej
>
>
>>
>
>
>> or perhaps I did something stupid regarding igmp. (is it even on?)
>>
>> > I'm starting not to question why it doesn't work, I'm starting to
>> wonder why
>> > it did work then ;-)
>> > Regards,
>> > Maciej
>> > On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht@gmail.com> wrote:
>> >>
>> >> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet@google.com>
>> wrote:
>> >> > Sorry, could you give us a copy of the panic stack trace ?
>> >>
>> >> I will get a serial console up on a wndr3800 by sunday. (sorry, just
>> >> landed in california, am in disarray)
>> >>
>> >> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
>> >>
>> >> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>> >>
>> >> --
>> >> Dave Täht
>> >>
>> >> Fixing bufferbloat with cerowrt:
>> >> http://www.teklibre.com/cerowrt/subscribe.html
>> >> _______________________________________________
>> >> Cerowrt-devel mailing list
>> >> Cerowrt-devel@lists.bufferbloat.net
>> >> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>> >
>> >
>>
>>
>>
>> --
>> Dave Täht
>>
>> Fixing bufferbloat with cerowrt:
>> http://www.teklibre.com/cerowrt/subscribe.html
>>
>
>


-- 
Dave Täht

Fixing bufferbloat with cerowrt:
http://www.teklibre.com/cerowrt/subscribe.html

[-- Attachment #2: Type: text/html, Size: 4618 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2013-01-17  3:44 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-04 17:04 [Cerowrt-devel] TFO crashes cerowrt 3.7.1-1 Dave Taht
2013-01-04 17:27 ` Eric Dumazet
2013-01-04 17:33   ` Dave Taht
2013-01-04 20:42     ` Maciej Soltysiak
2013-01-04 20:43       ` Maciej Soltysiak
2013-01-04 20:57         ` Jerry Chu
2013-01-04 21:21           ` Dave Taht
2013-01-04 21:36             ` Jerry Chu
2013-01-04 21:44               ` Dave Taht
2013-01-04 21:01         ` dpreed
2013-01-04 22:49           ` Robert Bradley
2013-01-04 21:11       ` Dave Taht
2013-01-04 21:19         ` Jerry Chu
2013-01-05  1:59           ` Ketan Kulkarni
2013-01-05  2:20             ` Yuchung Cheng
2013-01-05  3:02               ` Ketan Kulkarni
2013-01-05  3:16                 ` Eric Dumazet
2013-01-05  3:35                 ` Dave Taht
2013-01-05  4:05                   ` Dave Taht
2013-01-05 19:13                 ` Ketan Kulkarni
2013-01-13 17:01                   ` Ketan Kulkarni
2013-01-13 18:03                     ` Eric Dumazet
2013-01-13 21:39                       ` Felix Fietkau
2013-01-14  0:38                         ` Yuchung Cheng
2013-01-14  3:05                         ` Eric Dumazet
2013-01-14  4:07                           ` Eric Dumazet
2013-01-14  4:43                             ` Ketan Kulkarni
2013-01-14  6:14                               ` Dave Taht
2013-01-14 19:50                                 ` Dave Taht
2013-01-14  8:18                           ` Jerry Chu
2013-01-14 16:32                             ` Eric Dumazet
2013-01-04 22:25       ` Robert Bradley
2013-01-14  6:11       ` Dave Taht
2013-01-14 16:37         ` Ketan Kulkarni
2013-01-16 22:19         ` Maciej Soltysiak
2013-01-17  0:58           ` Dave Taht
2013-01-17  3:44           ` Dave Taht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox