From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-bw0-f43.google.com (mail-bw0-f43.google.com [209.85.214.43]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id EC85E20216B for ; Tue, 30 Aug 2011 18:58:31 -0700 (PDT) Received: by bkbzv15 with SMTP id zv15so400269bkb.16 for ; Tue, 30 Aug 2011 18:58:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=ZfQ0n9l+y6l6Zl8amZlpWjwPWyW+9MueDJFKofZ1MWY=; b=f+tWDKp4iq3qAh5xX1MdDarZfXvlbdDotalq2Jkn4ItEUOXykGBgdiJZTu289H92tm 7kyd0p5aMohv82QRuDjv7OaxMeAArFmhd0n+zbBIfLbs8TyA9BZbeI5Xd/JEumWv7xQ6 I/aG0evIh2itmgxvQt+8h+DY2SCcjhnkCGD6g= MIME-Version: 1.0 Received: by 10.223.26.71 with SMTP id d7mr9944832fac.23.1314755909426; Tue, 30 Aug 2011 18:58:29 -0700 (PDT) Received: by 10.152.40.194 with HTTP; Tue, 30 Aug 2011 18:58:29 -0700 (PDT) In-Reply-To: References: <4E5D87DD.7040705@hp.com> Date: Tue, 30 Aug 2011 18:58:29 -0700 Message-ID: Subject: Re: oprofiling is much saner looking now with rc6-smoketest From: Dave Taht To: Rick Jones Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: bloat-devel X-BeenThere: bloat-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Developers working on AQM, device drivers, and networking stacks" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 01:58:32 -0000 I have put the current rc6 smoketest up at: http://huchra.bufferbloat.net/~cero1/rc6-smoketest/ So far it's proving very stable. Wireless performance is excellent and wired performance dramatically improved. No crash bugs thus far, though I had a scare... For the final rc6, which I hope to have done by friday, I'm in the process of cleanly re-assembling the patch set (sorry, the sources are a bit of a mess at present). For this rc, I'm hoping that a new iptables lands, in particular, and I have numerous other little things in the queue to sort out. All that said, getting oprofile running is not hard, and I do appreciate smoke testers helping out!!! as I don't think I'll be able to get another release candidate done before linux plumbers. install the correct image on your router from the above via web interface or sysupgrade -n reboot edit /etc/opkg.conf to have that url in it opkg update opkg install oprofile cd /tmp mkdir /tmp/oprofile wget http://huchra.bufferbloat.net/~d/rc6-smoke-captures/vmlinux opcontrol --vmlinux=3D/tmp/vmlinux --session-dir=3D/tmp/oprofile (saving profile data to flash is a bad idea) opcontrol --start # do your testing opcontrol --dump opreport -c # or whatever options you like. On Tue, Aug 30, 2011 at 6:45 PM, Dave Taht wrote: > On Tue, Aug 30, 2011 at 6:01 PM, Rick Jones wrote: >> On 08/30/2011 05:32 PM, Dave Taht wrote: > >>> It bugs me that iptables and conntrack eat so much cpu for what >>> is an internal-only connection, e.g. one that >>> doesn't need conntracking. >> >> The csum_partial is a bit surprising - I thought every NIC and its dog >> offered CKO these days - or is that something happening with >> ip_tables/contrack? > > If this chipset supports it, so far as I know, it isn't documented or > implemented. > >> I also thought that Linux used an integrated >> copy/checksum in at least one direction, or did that go away when CKO be= came >> prevalent? > > Don't know. > >> >> If this is inbound, and there is just plain checksumming and not anythin= g >> funny from conntrack, I would have expected checksum to be much larger t= han >> copy. =A0Checksum (in the inbound direction) will take the cache misses = and >> the copy would not. =A0Unless... the data cache of the processor is gett= ing >> completely trashed - say from the netserver running on the router not >> keeping up with the inbound data fully and so the copy gets "far away" f= rom >> the checksum verification. > > 220Mbit isn't good enough for ya? Previous tests ran at about 140Mbit, bu= t due > to some major optimizations by felix to fix a bunch of mis-alignment > issues. Through the router, I've seen 260Mbit - which is perilously > close to the speed that I can drive it at from the test boxes. > >> >> Does perf/perf_events (whatever the followon to perfmon2 is called) have >> support for the CPU used in the device? =A0(Assuming it even has a PMU t= o be >> queried in the first place) > > Yes. Don't think it's enabled. It is running flat out, according to top. > >> >>> That said, I understand that people like their statistics, and me, >>> I'm trying to make split-tcp work better, ultimately, one day.... >>> >>> I'm going to rerun this without the fw rules next. >> >> It would be interesting to see if the csum time goes away. =A0Long ago a= nd far >> away when I was beating on a 32-core system with aggregate netperf TCP_R= R >> and enabling or not FW rules, conntrack had a non-trivial effect indeed = on >> performance. > > Stays about the same. iptables time drops. How to disable conntrack? > Don't you only really > need it for nat? > >> >> http://markmail.org/message/exjtzel7vq2ugt66#query:netdev%20conntrack%20= rick%20jones%2032%20netperf+page:1+mid:s5v5kylvmlfrpb7a+state:results >> >> I think will get to the start of that thread. =A0The subject is '32 core >> net-next stack/netfilter "scaling"' >> >> rick jones >> > > > > -- > Dave T=E4ht > SKYPE: davetaht > US Tel: 1-239-829-5608 > http://the-edge.blogspot.com > --=20 Dave T=E4ht SKYPE: davetaht US Tel: 1-239-829-5608 http://the-edge.blogspot.com