From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from masada.superduper.net (masada.superduper.net [85.133.44.198]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id 746C020216B for ; Tue, 30 Aug 2011 18:20:30 -0700 (PDT) Received: from snappy-wlan.parc.xerox.com ([13.1.108.21]) by masada.superduper.net with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1QyZTc-00077M-3H for bloat-devel@lists.bufferbloat.net; Wed, 31 Aug 2011 02:20:28 +0100 Message-ID: <4E5D8C55.3090204@superduper.net> Date: Tue, 30 Aug 2011 18:20:21 -0700 From: Simon Barber User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.20) Gecko/20110805 Lightning/1.0b2 Thunderbird/3.1.12 MIME-Version: 1.0 To: bloat-devel@lists.bufferbloat.net Subject: Re: oprofiling is much saner looking now with rc6-smoketest References: <4E5D87DD.7040705@hp.com> <4E5D8A02.7040500@superduper.net> In-Reply-To: <4E5D8A02.7040500@superduper.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.9 (--) X-BeenThere: bloat-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Developers working on AQM, device drivers, and networking stacks" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 01:20:30 -0000 Apologies - I should have read the scenario better - the connection is terminated on the router. Simon On 08/30/2011 06:10 PM, Simon Barber wrote: > Why is conntrack even getting involved? > > Simon > > On 08/30/2011 06:01 PM, Rick Jones wrote: >> On 08/30/2011 05:32 PM, Dave Taht wrote: >>> I get about 190Mbit/sec from netperf now, on GigE, with oprofiling >>> enabled, driver buffers of 4, txqueue of 8, cerowrt default iptables >>> rules, AND web10g patched into kernel 3.0.3. >>> >>> This is much saner than rc3, and judging from the csum_partial and >>> copy_user being roughly equal, there isn't much left to be gained... >> > >>> Nice work. >>> >>> (Without oprofiling, and without web10g and with tcp cubic I can get >>> past 250Mbit) >>> >>> >>> CPU: MIPS 24K, speed 0 MHz (estimated) >>> Counted INSTRUCTIONS events (Instructions completed) with a unit mask >>> of 0x00 (No unit mask) count 100000 >>> samples % app name symbol name >>> ------------------------------------------------------------------------------- >>> >>> >>> 17277 13.8798 vmlinux csum_partial >>> 17277 100.000 vmlinux csum_partial [self] >>> ------------------------------------------------------------------------------- >>> >>> >>> 16607 13.3415 vmlinux __copy_user >>> 16607 100.000 vmlinux __copy_user [self] >>> ------------------------------------------------------------------------------- >>> >>> >>> 11913 9.5705 ip_tables /ip_tables >>> 11913 100.000 ip_tables /ip_tables [self] >>> ------------------------------------------------------------------------------- >>> >>> >>> 8949 7.1893 nf_conntrack /nf_conntrack >>> 8949 100.000 nf_conntrack /nf_conntrack [self] >>> >>> In this case I was going from laptop - gige - through another >>> rc6-smoketest router - to_this_box's internal lan port. >>> >>> It bugs me that iptables and conntrack eat so much cpu for what >>> is an internal-only connection, e.g. one that >>> doesn't need conntracking. >> >> The csum_partial is a bit surprising - I thought every NIC and its dog >> offered CKO these days - or is that something happening with >> ip_tables/contrack? I also thought that Linux used an integrated >> copy/checksum in at least one direction, or did that go away when CKO >> became prevalent? >> >> If this is inbound, and there is just plain checksumming and not >> anything funny from conntrack, I would have expected checksum to be much >> larger than copy. Checksum (in the inbound direction) will take the >> cache misses and the copy would not. Unless... the data cache of the >> processor is getting completely trashed - say from the netserver running >> on the router not keeping up with the inbound data fully and so the copy >> gets "far away" from the checksum verification. >> >> Does perf/perf_events (whatever the followon to perfmon2 is called) have >> support for the CPU used in the device? (Assuming it even has a PMU to >> be queried in the first place) >> >>> That said, I understand that people like their statistics, and me, >>> I'm trying to make split-tcp work better, ultimately, one day.... >>> >>> I'm going to rerun this without the fw rules next. >> >> It would be interesting to see if the csum time goes away. Long ago and >> far away when I was beating on a 32-core system with aggregate netperf >> TCP_RR and enabling or not FW rules, conntrack had a non-trivial effect >> indeed on performance. >> >> http://markmail.org/message/exjtzel7vq2ugt66#query:netdev%20conntrack%20rick%20jones%2032%20netperf+page:1+mid:s5v5kylvmlfrpb7a+state:results >> >> >> >> I think will get to the start of that thread. The subject is '32 core >> net-next stack/netfilter "scaling"' >> >> rick jones >> _______________________________________________ >> Bloat-devel mailing list >> Bloat-devel@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat-devel > _______________________________________________ > Bloat-devel mailing list > Bloat-devel@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat-devel