[Cerowrt-devel] deployed some cero this weekend, chasing checksums

Mon Jan 28 09:14:37 EST 2013

On Mon, Jan 28, 2013 at 5:43 AM, Robert Bradley
<robert.bradley1 at gmail.com>wrote:

> It looks more like data corruption of various forms as opposed to a fault
> in checksumming:
>
> - Truncation of some layer-4 data including headers to 75 octets
> - Some bad TCP packets have stored header lengths of 0 octets
> - I often see lines of incrementing bytes (30 31 32 etc.).  For example,
> packet 962 has a train of values from 0x10 to 0x2f, starting at position
> 0x003a (the TCP timestamps).  I think these are meant to be fragments from
> the ping packets (which contain 8 octets then values 0x10 to 0x37), but
> these are straying into non-ICMP packets.
> - There are pieces of HTTP in non-HTTP protocols.  For example, packet
> 1394 is supposed to be UDP, but looks like it is really TCP traffic with
> the wrong protocol number.  The checksum is still invalid in either case.
> - It is possible to corrupt layer-4 checksums only, leaving the IP layer
> untouched.
>
>
> On 28 January 2013 07:52, Dave Taht <dave.taht at gmail.com> wrote:
>
>> Put up a pic http://snapon.lab.bufferbloat.net/~d/yurt
>>
>> they aren't bad all the time, but when they go bad, bad things happen.
>>
>>
>> On Sun, Jan 27, 2013 at 11:41 PM, Dave Taht <dave.taht at gmail.com> wrote:
>>
>>>
>>> I have been debugging some weirdness for a while. You might want to do
>>> some captures on the latest cero and look at checksums.
>>>
>>> An unreasonably high number of checksum issues seem to be happening, but
>>> there doesn't appear to be a whole lot of pattern to it, as yet.
>>>
>>> I will simplify. I pinged locally and 8.8.8.8 and surfed the web, and a
>>> symptom is that some other routers can't ping sometimes nor access much of
>>> the internet beyond the gateway. They can always reach the gateway.
>>>
>>> in the interim, the topology on this capture are
>>>
>>> 172.30.102.17 - laptop via ethernet to
>>> 172.20.102.1 - cerowrt 3.7.4-4 via ethernet to
>>> 172.20.6.1 - ubnt 3.3.8-26 via mesh to
>>> 172.20.142.11 - ubnt 3.7.4-4 via ethernet to
>>> * 192.168.100.1 - cerowrt 3.7.2 capture point (yes, updating that)
>>> 10.0.10.1 - comcast box (yes, double nat, fixing that)
>>>
>>> I took a capture on the se00 interface
>>>
>>> tcpdump -i se00 -w/tmp/yurt.cap host 172.20.102.17
>>>
>>> and stuck that capture there:
>>>
>>> http://snapon.lab.bufferbloat.net/~d/yurt/yurt.cap
>>>
>>> and then looked at it with wireshark with this filter
>>>
>>> ip.checksum_bad == 1
>>>
>>> and scratched my head at the error rate (about 1%) and the pattern (lack
>>> thereof)
>>>
>>> I will simplify in the mroning
>>>
>>> --
>>> Dave Täht
>>>
>>> Fixing bufferbloat with cerowrt:
>>> http://www.teklibre.com/cerowrt/subscribe.html
>>
>>
>>
>>
>> --
>> Dave Täht
>>
>> Fixing bufferbloat with cerowrt:
>> http://www.teklibre.com/cerowrt/subscribe.html
>>
>
>
>
> --
> Robert Bradley
>

Well, it could just be tcpdump_mini blowing up. (doesn't explain the
problems on the network tho)

running tcpdump locally from the testing laptop I get no bad crcs anywhere
on the path, forward or reverse....

-- 
Dave Täht

Fixing bufferbloat with cerowrt:
http://www.teklibre.com/cerowrt/subscribe.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20130128/c2fe3218/attachment-0002.html>