Development issues regarding the cerowrt test router project
 help / color / mirror / Atom feed
* [Cerowrt-devel] deployed some cero this weekend, chasing checksums
@ 2013-01-28  7:41 Dave Taht
  2013-01-28  7:52 ` Dave Taht
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Taht @ 2013-01-28  7:41 UTC (permalink / raw)
  To: Robert Bradley, Felix Fietkau, cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 1329 bytes --]

I have been debugging some weirdness for a while. You might want to do some
captures on the latest cero and look at checksums.

An unreasonably high number of checksum issues seem to be happening, but
there doesn't appear to be a whole lot of pattern to it, as yet.

I will simplify. I pinged locally and 8.8.8.8 and surfed the web, and a
symptom is that some other routers can't ping sometimes nor access much of
the internet beyond the gateway. They can always reach the gateway.

in the interim, the topology on this capture are

172.30.102.17 - laptop via ethernet to
172.20.102.1 - cerowrt 3.7.4-4 via ethernet to
172.20.6.1 - ubnt 3.3.8-26 via mesh to
172.20.142.11 - ubnt 3.7.4-4 via ethernet to
* 192.168.100.1 - cerowrt 3.7.2 capture point (yes, updating that)
10.0.10.1 - comcast box (yes, double nat, fixing that)

I took a capture on the se00 interface

tcpdump -i se00 -w/tmp/yurt.cap host 172.20.102.17

and stuck that capture there:

http://snapon.lab.bufferbloat.net/~d/yurt/yurt.cap

and then looked at it with wireshark with this filter

ip.checksum_bad == 1

and scratched my head at the error rate (about 1%) and the pattern (lack
thereof)

I will simplify in the mroning

-- 
Dave Täht

Fixing bufferbloat with cerowrt:
http://www.teklibre.com/cerowrt/subscribe.html

[-- Attachment #2: Type: text/html, Size: 1595 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Cerowrt-devel] deployed some cero this weekend, chasing checksums
  2013-01-28  7:41 [Cerowrt-devel] deployed some cero this weekend, chasing checksums Dave Taht
@ 2013-01-28  7:52 ` Dave Taht
  2013-01-28 13:43   ` Robert Bradley
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Taht @ 2013-01-28  7:52 UTC (permalink / raw)
  To: Robert Bradley, Felix Fietkau, cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 1715 bytes --]

Put up a pic http://snapon.lab.bufferbloat.net/~d/yurt

they aren't bad all the time, but when they go bad, bad things happen.

On Sun, Jan 27, 2013 at 11:41 PM, Dave Taht <dave.taht@gmail.com> wrote:

>
> I have been debugging some weirdness for a while. You might want to do
> some captures on the latest cero and look at checksums.
>
> An unreasonably high number of checksum issues seem to be happening, but
> there doesn't appear to be a whole lot of pattern to it, as yet.
>
> I will simplify. I pinged locally and 8.8.8.8 and surfed the web, and a
> symptom is that some other routers can't ping sometimes nor access much of
> the internet beyond the gateway. They can always reach the gateway.
>
> in the interim, the topology on this capture are
>
> 172.30.102.17 - laptop via ethernet to
> 172.20.102.1 - cerowrt 3.7.4-4 via ethernet to
> 172.20.6.1 - ubnt 3.3.8-26 via mesh to
> 172.20.142.11 - ubnt 3.7.4-4 via ethernet to
> * 192.168.100.1 - cerowrt 3.7.2 capture point (yes, updating that)
> 10.0.10.1 - comcast box (yes, double nat, fixing that)
>
> I took a capture on the se00 interface
>
> tcpdump -i se00 -w/tmp/yurt.cap host 172.20.102.17
>
> and stuck that capture there:
>
> http://snapon.lab.bufferbloat.net/~d/yurt/yurt.cap
>
> and then looked at it with wireshark with this filter
>
> ip.checksum_bad == 1
>
> and scratched my head at the error rate (about 1%) and the pattern (lack
> thereof)
>
> I will simplify in the mroning
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt:
> http://www.teklibre.com/cerowrt/subscribe.html




-- 
Dave Täht

Fixing bufferbloat with cerowrt:
http://www.teklibre.com/cerowrt/subscribe.html

[-- Attachment #2: Type: text/html, Size: 2399 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Cerowrt-devel] deployed some cero this weekend, chasing checksums
  2013-01-28  7:52 ` Dave Taht
@ 2013-01-28 13:43   ` Robert Bradley
  2013-01-28 14:14     ` Dave Taht
  2013-01-28 23:29     ` David Lang
  0 siblings, 2 replies; 5+ messages in thread
From: Robert Bradley @ 2013-01-28 13:43 UTC (permalink / raw)
  To: Dave Taht; +Cc: <cerowrt-devel@lists.bufferbloat.net>, Felix Fietkau

[-- Attachment #1: Type: text/plain, Size: 2756 bytes --]

It looks more like data corruption of various forms as opposed to a fault
in checksumming:

- Truncation of some layer-4 data including headers to 75 octets
- Some bad TCP packets have stored header lengths of 0 octets
- I often see lines of incrementing bytes (30 31 32 etc.).  For example,
packet 962 has a train of values from 0x10 to 0x2f, starting at position
0x003a (the TCP timestamps).  I think these are meant to be fragments from
the ping packets (which contain 8 octets then values 0x10 to 0x37), but
these are straying into non-ICMP packets.
- There are pieces of HTTP in non-HTTP protocols.  For example, packet 1394
is supposed to be UDP, but looks like it is really TCP traffic with the
wrong protocol number.  The checksum is still invalid in either case.
- It is possible to corrupt layer-4 checksums only, leaving the IP layer
untouched.


On 28 January 2013 07:52, Dave Taht <dave.taht@gmail.com> wrote:

> Put up a pic http://snapon.lab.bufferbloat.net/~d/yurt
>
> they aren't bad all the time, but when they go bad, bad things happen.
>
>
> On Sun, Jan 27, 2013 at 11:41 PM, Dave Taht <dave.taht@gmail.com> wrote:
>
>>
>> I have been debugging some weirdness for a while. You might want to do
>> some captures on the latest cero and look at checksums.
>>
>> An unreasonably high number of checksum issues seem to be happening, but
>> there doesn't appear to be a whole lot of pattern to it, as yet.
>>
>> I will simplify. I pinged locally and 8.8.8.8 and surfed the web, and a
>> symptom is that some other routers can't ping sometimes nor access much of
>> the internet beyond the gateway. They can always reach the gateway.
>>
>> in the interim, the topology on this capture are
>>
>> 172.30.102.17 - laptop via ethernet to
>> 172.20.102.1 - cerowrt 3.7.4-4 via ethernet to
>> 172.20.6.1 - ubnt 3.3.8-26 via mesh to
>> 172.20.142.11 - ubnt 3.7.4-4 via ethernet to
>> * 192.168.100.1 - cerowrt 3.7.2 capture point (yes, updating that)
>> 10.0.10.1 - comcast box (yes, double nat, fixing that)
>>
>> I took a capture on the se00 interface
>>
>> tcpdump -i se00 -w/tmp/yurt.cap host 172.20.102.17
>>
>> and stuck that capture there:
>>
>> http://snapon.lab.bufferbloat.net/~d/yurt/yurt.cap
>>
>> and then looked at it with wireshark with this filter
>>
>> ip.checksum_bad == 1
>>
>> and scratched my head at the error rate (about 1%) and the pattern (lack
>> thereof)
>>
>> I will simplify in the mroning
>>
>> --
>> Dave Täht
>>
>> Fixing bufferbloat with cerowrt:
>> http://www.teklibre.com/cerowrt/subscribe.html
>
>
>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt:
> http://www.teklibre.com/cerowrt/subscribe.html
>



-- 
Robert Bradley

[-- Attachment #2: Type: text/html, Size: 3825 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Cerowrt-devel] deployed some cero this weekend, chasing checksums
  2013-01-28 13:43   ` Robert Bradley
@ 2013-01-28 14:14     ` Dave Taht
  2013-01-28 23:29     ` David Lang
  1 sibling, 0 replies; 5+ messages in thread
From: Dave Taht @ 2013-01-28 14:14 UTC (permalink / raw)
  To: Robert Bradley; +Cc: <cerowrt-devel@lists.bufferbloat.net>, Felix Fietkau

[-- Attachment #1: Type: text/plain, Size: 3261 bytes --]

On Mon, Jan 28, 2013 at 5:43 AM, Robert Bradley
<robert.bradley1@gmail.com>wrote:

> It looks more like data corruption of various forms as opposed to a fault
> in checksumming:
>
> - Truncation of some layer-4 data including headers to 75 octets
> - Some bad TCP packets have stored header lengths of 0 octets
> - I often see lines of incrementing bytes (30 31 32 etc.).  For example,
> packet 962 has a train of values from 0x10 to 0x2f, starting at position
> 0x003a (the TCP timestamps).  I think these are meant to be fragments from
> the ping packets (which contain 8 octets then values 0x10 to 0x37), but
> these are straying into non-ICMP packets.
> - There are pieces of HTTP in non-HTTP protocols.  For example, packet
> 1394 is supposed to be UDP, but looks like it is really TCP traffic with
> the wrong protocol number.  The checksum is still invalid in either case.
> - It is possible to corrupt layer-4 checksums only, leaving the IP layer
> untouched.
>
>
> On 28 January 2013 07:52, Dave Taht <dave.taht@gmail.com> wrote:
>
>> Put up a pic http://snapon.lab.bufferbloat.net/~d/yurt
>>
>> they aren't bad all the time, but when they go bad, bad things happen.
>>
>>
>> On Sun, Jan 27, 2013 at 11:41 PM, Dave Taht <dave.taht@gmail.com> wrote:
>>
>>>
>>> I have been debugging some weirdness for a while. You might want to do
>>> some captures on the latest cero and look at checksums.
>>>
>>> An unreasonably high number of checksum issues seem to be happening, but
>>> there doesn't appear to be a whole lot of pattern to it, as yet.
>>>
>>> I will simplify. I pinged locally and 8.8.8.8 and surfed the web, and a
>>> symptom is that some other routers can't ping sometimes nor access much of
>>> the internet beyond the gateway. They can always reach the gateway.
>>>
>>> in the interim, the topology on this capture are
>>>
>>> 172.30.102.17 - laptop via ethernet to
>>> 172.20.102.1 - cerowrt 3.7.4-4 via ethernet to
>>> 172.20.6.1 - ubnt 3.3.8-26 via mesh to
>>> 172.20.142.11 - ubnt 3.7.4-4 via ethernet to
>>> * 192.168.100.1 - cerowrt 3.7.2 capture point (yes, updating that)
>>> 10.0.10.1 - comcast box (yes, double nat, fixing that)
>>>
>>> I took a capture on the se00 interface
>>>
>>> tcpdump -i se00 -w/tmp/yurt.cap host 172.20.102.17
>>>
>>> and stuck that capture there:
>>>
>>> http://snapon.lab.bufferbloat.net/~d/yurt/yurt.cap
>>>
>>> and then looked at it with wireshark with this filter
>>>
>>> ip.checksum_bad == 1
>>>
>>> and scratched my head at the error rate (about 1%) and the pattern (lack
>>> thereof)
>>>
>>> I will simplify in the mroning
>>>
>>> --
>>> Dave Täht
>>>
>>> Fixing bufferbloat with cerowrt:
>>> http://www.teklibre.com/cerowrt/subscribe.html
>>
>>
>>
>>
>> --
>> Dave Täht
>>
>> Fixing bufferbloat with cerowrt:
>> http://www.teklibre.com/cerowrt/subscribe.html
>>
>
>
>
> --
> Robert Bradley
>

Well, it could just be tcpdump_mini blowing up. (doesn't explain the
problems on the network tho)

running tcpdump locally from the testing laptop I get no bad crcs anywhere
on the path, forward or reverse....

-- 
Dave Täht

Fixing bufferbloat with cerowrt:
http://www.teklibre.com/cerowrt/subscribe.html

[-- Attachment #2: Type: text/html, Size: 4666 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Cerowrt-devel] deployed some cero this weekend, chasing checksums
  2013-01-28 13:43   ` Robert Bradley
  2013-01-28 14:14     ` Dave Taht
@ 2013-01-28 23:29     ` David Lang
  1 sibling, 0 replies; 5+ messages in thread
From: David Lang @ 2013-01-28 23:29 UTC (permalink / raw)
  To: Robert Bradley; +Cc: <cerowrt-devel@lists.bufferbloat.net>, Felix Fietkau

[-- Attachment #1: Type: TEXT/Plain, Size: 3064 bytes --]

depending on your hardware, you may be seeing bogus checksums in outbound 
packets, but the packets on the wire have valid checksums because the hardware 
computes the checksums and sets them.

I know I've run into this one before when doing load testing.

David Lang

  On Mon, 28 Jan 2013, Robert Bradley wrote:

> It looks more like data corruption of various forms as opposed to a fault
> in checksumming:
>
> - Truncation of some layer-4 data including headers to 75 octets
> - Some bad TCP packets have stored header lengths of 0 octets
> - I often see lines of incrementing bytes (30 31 32 etc.).  For example,
> packet 962 has a train of values from 0x10 to 0x2f, starting at position
> 0x003a (the TCP timestamps).  I think these are meant to be fragments from
> the ping packets (which contain 8 octets then values 0x10 to 0x37), but
> these are straying into non-ICMP packets.
> - There are pieces of HTTP in non-HTTP protocols.  For example, packet 1394
> is supposed to be UDP, but looks like it is really TCP traffic with the
> wrong protocol number.  The checksum is still invalid in either case.
> - It is possible to corrupt layer-4 checksums only, leaving the IP layer
> untouched.
>
>
> On 28 January 2013 07:52, Dave Taht <dave.taht@gmail.com> wrote:
>
>> Put up a pic http://snapon.lab.bufferbloat.net/~d/yurt
>>
>> they aren't bad all the time, but when they go bad, bad things happen.
>>
>>
>> On Sun, Jan 27, 2013 at 11:41 PM, Dave Taht <dave.taht@gmail.com> wrote:
>>
>>>
>>> I have been debugging some weirdness for a while. You might want to do
>>> some captures on the latest cero and look at checksums.
>>>
>>> An unreasonably high number of checksum issues seem to be happening, but
>>> there doesn't appear to be a whole lot of pattern to it, as yet.
>>>
>>> I will simplify. I pinged locally and 8.8.8.8 and surfed the web, and a
>>> symptom is that some other routers can't ping sometimes nor access much of
>>> the internet beyond the gateway. They can always reach the gateway.
>>>
>>> in the interim, the topology on this capture are
>>>
>>> 172.30.102.17 - laptop via ethernet to
>>> 172.20.102.1 - cerowrt 3.7.4-4 via ethernet to
>>> 172.20.6.1 - ubnt 3.3.8-26 via mesh to
>>> 172.20.142.11 - ubnt 3.7.4-4 via ethernet to
>>> * 192.168.100.1 - cerowrt 3.7.2 capture point (yes, updating that)
>>> 10.0.10.1 - comcast box (yes, double nat, fixing that)
>>>
>>> I took a capture on the se00 interface
>>>
>>> tcpdump -i se00 -w/tmp/yurt.cap host 172.20.102.17
>>>
>>> and stuck that capture there:
>>>
>>> http://snapon.lab.bufferbloat.net/~d/yurt/yurt.cap
>>>
>>> and then looked at it with wireshark with this filter
>>>
>>> ip.checksum_bad == 1
>>>
>>> and scratched my head at the error rate (about 1%) and the pattern (lack
>>> thereof)
>>>
>>> I will simplify in the mroning
>>>
>>> --
>>> Dave T?ht
>>>
>>> Fixing bufferbloat with cerowrt:
>>> http://www.teklibre.com/cerowrt/subscribe.html
>>
>>
>>
>>
>> --
>> Dave T?ht
>>
>> Fixing bufferbloat with cerowrt:
>> http://www.teklibre.com/cerowrt/subscribe.html
>>
>
>
>
>

[-- Attachment #2: Type: TEXT/PLAIN, Size: 164 bytes --]

_______________________________________________
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-01-28 23:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-28  7:41 [Cerowrt-devel] deployed some cero this weekend, chasing checksums Dave Taht
2013-01-28  7:52 ` Dave Taht
2013-01-28 13:43   ` Robert Bradley
2013-01-28 14:14     ` Dave Taht
2013-01-28 23:29     ` David Lang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox