From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ia0-x22a.google.com (ia-in-x022a.1e100.net [IPv6:2607:f8b0:4001:c02::22a]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 3A8DB21F144 for ; Mon, 28 Jan 2013 06:14:39 -0800 (PST) Received: by mail-ia0-f170.google.com with SMTP id k20so4260503iak.1 for ; Mon, 28 Jan 2013 06:14:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=Gvu6NbZxMqfQug7CPHPzdsSIQGeM23eBgh0vaxj3wLg=; b=tu9ox0yMs3Oxdy+XaMIzDeR95Rf29T0Nj0twNHVkaCV1tNbhp+A4FyvnPzBP6O41vO XOhqitBx0Ij4Dac7I45oQdxjW3QyhYSfnotPEIfwa/CVxD2JpbbzEgmENXG5vOCwG/1X 5RBCDkMu5ZczHESyEalEwh9wu57n8mKtprE+nrdj6tp8Q7ZhWBvRCbIguf2Xmu6HYNBU Ht3mY5PYDSlY66RukJMJUPPYPSsoH8ffPGj7IjlbiTMjoy2VeayYussmmYMIwmzPhHNd ahsOF6ulqRdrLivGlwGO94d6mdl+6jVtkBPaaOHlrZ1WIvhA7YC/R8ey8vhx94cpe5E4 +CwQ== MIME-Version: 1.0 X-Received: by 10.50.196.164 with SMTP id in4mr4874676igc.86.1359382478078; Mon, 28 Jan 2013 06:14:38 -0800 (PST) Received: by 10.64.135.39 with HTTP; Mon, 28 Jan 2013 06:14:37 -0800 (PST) In-Reply-To: References: Date: Mon, 28 Jan 2013 06:14:37 -0800 Message-ID: From: Dave Taht To: Robert Bradley Content-Type: multipart/alternative; boundary=14dae93405c9b8446904d459e5c6 Cc: "" , Felix Fietkau Subject: Re: [Cerowrt-devel] deployed some cero this weekend, chasing checksums X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Jan 2013 14:14:39 -0000 --14dae93405c9b8446904d459e5c6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Mon, Jan 28, 2013 at 5:43 AM, Robert Bradley wrote: > It looks more like data corruption of various forms as opposed to a fault > in checksumming: > > - Truncation of some layer-4 data including headers to 75 octets > - Some bad TCP packets have stored header lengths of 0 octets > - I often see lines of incrementing bytes (30 31 32 etc.). For example, > packet 962 has a train of values from 0x10 to 0x2f, starting at position > 0x003a (the TCP timestamps). I think these are meant to be fragments fro= m > the ping packets (which contain 8 octets then values 0x10 to 0x37), but > these are straying into non-ICMP packets. > - There are pieces of HTTP in non-HTTP protocols. For example, packet > 1394 is supposed to be UDP, but looks like it is really TCP traffic with > the wrong protocol number. The checksum is still invalid in either case. > - It is possible to corrupt layer-4 checksums only, leaving the IP layer > untouched. > > > On 28 January 2013 07:52, Dave Taht wrote: > >> Put up a pic http://snapon.lab.bufferbloat.net/~d/yurt >> >> they aren't bad all the time, but when they go bad, bad things happen. >> >> >> On Sun, Jan 27, 2013 at 11:41 PM, Dave Taht wrote: >> >>> >>> I have been debugging some weirdness for a while. You might want to do >>> some captures on the latest cero and look at checksums. >>> >>> An unreasonably high number of checksum issues seem to be happening, bu= t >>> there doesn't appear to be a whole lot of pattern to it, as yet. >>> >>> I will simplify. I pinged locally and 8.8.8.8 and surfed the web, and a >>> symptom is that some other routers can't ping sometimes nor access much= of >>> the internet beyond the gateway. They can always reach the gateway. >>> >>> in the interim, the topology on this capture are >>> >>> 172.30.102.17 - laptop via ethernet to >>> 172.20.102.1 - cerowrt 3.7.4-4 via ethernet to >>> 172.20.6.1 - ubnt 3.3.8-26 via mesh to >>> 172.20.142.11 - ubnt 3.7.4-4 via ethernet to >>> * 192.168.100.1 - cerowrt 3.7.2 capture point (yes, updating that) >>> 10.0.10.1 - comcast box (yes, double nat, fixing that) >>> >>> I took a capture on the se00 interface >>> >>> tcpdump -i se00 -w/tmp/yurt.cap host 172.20.102.17 >>> >>> and stuck that capture there: >>> >>> http://snapon.lab.bufferbloat.net/~d/yurt/yurt.cap >>> >>> and then looked at it with wireshark with this filter >>> >>> ip.checksum_bad =3D=3D 1 >>> >>> and scratched my head at the error rate (about 1%) and the pattern (lac= k >>> thereof) >>> >>> I will simplify in the mroning >>> >>> -- >>> Dave T=E4ht >>> >>> Fixing bufferbloat with cerowrt: >>> http://www.teklibre.com/cerowrt/subscribe.html >> >> >> >> >> -- >> Dave T=E4ht >> >> Fixing bufferbloat with cerowrt: >> http://www.teklibre.com/cerowrt/subscribe.html >> > > > > -- > Robert Bradley > Well, it could just be tcpdump_mini blowing up. (doesn't explain the problems on the network tho) running tcpdump locally from the testing laptop I get no bad crcs anywhere on the path, forward or reverse.... --=20 Dave T=E4ht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html --14dae93405c9b8446904d459e5c6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Mon, Jan 28, 2013 at 5:43 AM, Robert = Bradley <robert.bradley1@gmail.com> wrote:
It looks more like data corruption of = various forms as opposed to a fault in checksumming:

- Truncat= ion of some layer-4 data including headers to 75 octets
- Some bad= TCP packets have stored header lengths of 0 octets
- I often see lines of incrementing bytes (30 31 32 etc.).=A0 For exa= mple, packet 962 has a train of values from 0x10 to 0x2f, starting at posit= ion 0x003a (the TCP timestamps).=A0 I think these are meant to be fragments= from the ping packets (which contain 8 octets then values 0x10 to 0x37), b= ut these are straying into non-ICMP packets.
- There are pieces of HTTP in non-HTTP protocols.=A0 For example= , packet 1394 is supposed to be UDP, but looks like it is really TCP traffi= c with the wrong protocol number.=A0 The checksum is still invalid in eithe= r case.
- It is possible to corrupt layer-4 checksums only, leaving the = IP layer untouched.


On 28 January 2013 07:52, Dave T= aht <dave.taht@gmail.com> wrote:
Put up a pic http://snapon.lab.bufferbloat.net/= ~d/yurt

they aren't bad all the time, but when they go bad, bad things happ= en.


On Sun, Jan 27, 2013 at 11:41 PM, Dave Taht <dave.taht@gmail.com&g= t; wrote:

I have been debugging some weirdn= ess for a while. You might want to do some captures on the latest cero and = look at checksums.

An unreasonably high number of checksum issues s= eem to be happening, but there doesn't appear to be a whole lot of patt= ern to it, as yet.

I will simplify. I pinged locally and 8.8.8.8 and surfed the web, and a= symptom is that some other routers can't ping sometimes nor access muc= h of the internet beyond the gateway. They can always reach the gateway.=A0=

in the interim, the topology on this capture are

172.30.102.17 -= laptop via ethernet to
172.20.102.1 - cerowrt 3.7.4-4 via ethernet to172.20.6.1 - ubnt 3.3.8-26 via mesh to
172.20.142.11 - ubnt 3.7.4-4 vi= a ethernet to
* 192.168.100.1 - cerowrt 3.7.2 capture point (yes, updating that)
10.0.= 10.1 - comcast box (yes, double nat, fixing that)

I took a capture o= n the se00 interface

tcpdump -i se00 -w/tmp/yurt.cap host 172.20.102= .17

and stuck that capture there:

http://snapon.lab.bufferbloa= t.net/~d/yurt/yurt.cap

and then looked at it with wireshark with= this filter

ip.checksum_bad =3D=3D 1

and scratched my head at the error rate= (about 1%) and the pattern (lack thereof)

I will simplify in the mr= oning

--
Dave T=E4ht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subs= cribe.html=20



--
Dave T=E4= ht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowr= t/subscribe.html=20



--
Robert Bradley

Well, it could just be tcpdump_mini blowing up. (doe= sn't explain the problems on the network tho)

running tcpdump lo= cally from the testing laptop I get no bad crcs anywhere on the path, forwa= rd or reverse....

--
Dave T=E4ht

Fixing bufferbloat with cerowrt: http://www= .teklibre.com/cerowrt/subscribe.html=20 --14dae93405c9b8446904d459e5c6--