[Cerowrt-devel] TFO crashes cerowrt 3.7.1-1

Ketan Kulkarni ketkulka at gmail.com
Mon Jan 14 11:37:44 EST 2013


I have never played around with polipo proxy much - neither did wonder
about its DNS behavior.
It would be good to have a bug filed and discussion tracked over there.

Maciej: can you please report a bug and put the logs (preferably
without TFO ;-) )?
I can take a look at those later this week probably.

Thanks,
Ketan

On Mon, Jan 14, 2013 at 11:41 AM, Dave Taht <dave.taht at gmail.com> wrote:
> This is a different issue that tfo, so taking the tfo-ers off the list
>
> On Fri, Jan 4, 2013 at 12:42 PM, Maciej Soltysiak <maciej at soltysiak.com> wrote:
>> I am seeing something strange here, with polipo related to TFO but also DNS.
>
> I have had polipo's internal dns resolver mess up on multiple occasions
> exactly along the lines you describe. There is a bug for it in the
> cerowrt database as best as I recall.
>
> I have never tracked down why it happens.
>
>> When I just took 3.7.1-1 and set my windows 7 laptop to use gw.home.lan:8123
>> as http proxy it didn't work. What I observed was:
>> A) after quite a while polipo's response to browser was 504 Host
>> www.osnews.com lookup failed: Timeout
>> b) this error in ssh console: Host osnews.com lookup failed: Timeout
>> (131072)
>> c) Disabling TFO by adding option useTCPFastOpen 'false' to config 'polipo'
>> 'general' works around the problem
>> d) Alternatively, you can keep TFO enabled in polipo but change option
>> 'dnsUseGethostbyname' from 'reluctantly' to 'true' (!)
>> This is very weird, because TFO is TCP and the DNS queries fired off by
>> polipo are UDP:
>> root at OpenWrt:/tmp/log# tcpdump -n -v -vv -vvv -x -X -s 1500 -i lo
>> 20:21:56.160245 IP (tos 0x0, ttl 64, id 50129, offset 0, flags [DF], proto
>
> No, it's not weird, there's something about uclibc and polipo interacting here
> that is kind of unknown. It has always seemed to me to be maybe a bug
> in polipo's internal dns resolver on mips...
>
>> UDP (17), length 60)
>> 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd17f!] 55396+ A?
>
> The bad checksum issue probably doesn't matter.
>
> However an actual tcpdump capture file would be useful to have to look
> at the format of the dns query.
>
>> www.osnews.com. (32)
>
>> 0x0000: 4500 003c c3d1 4000 4011 78dd 7f00 0001 E..<.. at .@.x.....
>> 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>> 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>> 0x0030: 6577 7303 636f 6d00 0001 0001 ews.com.....
>> 20:21:56.160319 IP (tos 0x0, ttl 64, id 50130, offset 0, flags [DF], proto
>> UDP (17), length 60)
>> 127.0.0.1.47304 > 127.0.0.1.53: [bad udp cksum 0xfe3b -> 0xd164!] 55396+
>> AAAA? www.osnews.com. (32)
>> 0x0000: 4500 003c c3d2 4000 4011 78dc 7f00 0001 E..<.. at .@.x.....
>> 0x0010: 7f00 0001 b8c8 0035 0028 fe3b d864 0100 .......5.(.;.d..
>> 0x0020: 0001 0000 0000 0000 0377 7777 066f 736e .........www.osn
>> 0x0030: 6577 7303 636f 6d00 001c 0001 ews.com.....
>> 20:21:56.169942 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
>> (17), length 123)
>> 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe7a -> 0x5f73!] 55396 q:
>> A? www.osnews.com. 1/2/0 www.osnews.com. [29m3s] A 74.86.31.159 ns:
>> osnews.com. [29m3s] NS ns2.swelter.net., osnews.com. [29m3s] NS
>> ns1.swelter.net. (95)
>> 0x0000: 4500 007b 0000 4000 4011 3c70 7f00 0001 E..{.. at .@.<p....
>> 0x0010: 7f00 0001 0035 b8c8 0067 fe7a d864 8180 .....5...g.z.d..
>> 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>> 0x0030: 6577 7303 636f 6d00 0001 0001 c00c 0001 ews.com.........
>> 0x0040: 0001 0000 06cf 0004 4a56 1f9f c010 0002 ........JV......
>> 0x0050: 0001 0000 06cf 0011 036e 7332 0773 7765 .........ns2.swe
>> 0x0060: 6c74 6572 036e 6574 00c0 1000 0200 0100 lter.net........
>> 0x0070: 0006 cf00 0603 6e73 31c0 40 ......ns1.@
>> 20:21:56.173901 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
>> (17), length 135)
>> 127.0.0.1.53 > 127.0.0.1.47304: [bad udp cksum 0xfe86 -> 0x8ecb!] 55396 q:
>> AAAA? www.osnews.com. 1/2/0 www.osnews.com. [54m44s] AAAA
>> 2607:f0d0:1002:62::3 ns: osnews.com. [29m3s] NS ns1.swelter.net.,
>> osnews.com. [29m3s] NS ns2.swelter.net. (107)
>> 0x0000: 4500 0087 0000 4000 4011 3c64 7f00 0001 E..... at .@.<d....
>> 0x0010: 7f00 0001 0035 b8c8 0073 fe86 d864 8180 .....5...s...d..
>> 0x0020: 0001 0001 0002 0000 0377 7777 066f 736e .........www.osn
>> 0x0030: 6577 7303 636f 6d00 001c 0001 c00c 001c ews.com.........
>> 0x0040: 0001 0000 0cd4 0010 2607 f0d0 1002 0062 ........&......b
>> 0x0050: 0000 0000 0000 0003 c010 0002 0001 0000 ................
>> 0x0060: 06cf 0011 036e 7331 0773 7765 6c74 6572 .....ns1.swelter
>> 0x0070: 036e 6574 00c0 1000 0200 0100 0006 cf00 .net............
>> 0x0080: 0603 6e73 32c0 4c ..ns2.L
>> This is the only DNS traffic I saw during the attempts. The tcpdumps have
>> udp bad checksum but when I disabled TFO in polipo, the UDP where still bad
>> checksum but they worked.
>
> I hesitate to draw a connection between TFO and the DNS failures. What
> I would see was polipo would work for a while, then start failing on
> DNS traffic, and like you my workaround was to use gethostbyname
> (which unfortunately clobbers performance).
>
> As fond as I am of split tcp solutions I never poked into this further
> at the time....
>
> It's probably a really simple off-by-one error in the dns code inside
> polipo. Perhaps a packet capture will get us closer. Is there an
> active mailing list for it?
>
>
>> Really weird.
>> p.s. UPNP still works for port forwarding negotiation as it did in 3.6.11-4
>> I still couldn't get the UPNP/SSDP broadcasts (udp to 239.255.255.250) to
>> being forwarded between se00 and sw00/sw10. Last time it worked was ~3.3.8.
>
> OK, yet another issue.
>
> The routing cache got eliminated between 3.3 and 3.6, and there were
> all sorts of changes to it over the last 6 releases that have been
> bothersome.
>
> or perhaps I did something stupid regarding igmp. (is it even on?)
>
>> I'm starting not to question why it doesn't work, I'm starting to wonder why
>> it did work then ;-)
>> Regards,
>> Maciej
>> On Fri, Jan 4, 2013 at 6:33 PM, Dave Taht <dave.taht at gmail.com> wrote:
>>>
>>> On Fri, Jan 4, 2013 at 9:27 AM, Eric Dumazet <edumazet at google.com> wrote:
>>> > Sorry, could you give us a copy of the panic stack trace ?
>>>
>>> I will get a serial console up on a wndr3800 by sunday. (sorry, just
>>> landed in california, am in disarray)
>>>
>>> The latest dev build of cero for the wndr3800 and wndr3700v2 is at:
>>>
>>> http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.7.1-1/
>>>
>>> --
>>> Dave Täht
>>>
>>> Fixing bufferbloat with cerowrt:
>>> http://www.teklibre.com/cerowrt/subscribe.html
>>> _______________________________________________
>>> Cerowrt-devel mailing list
>>> Cerowrt-devel at lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>
>>
>
>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html



More information about the Cerowrt-devel mailing list