From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-x232.google.com (mail-lb0-x232.google.com [IPv6:2a00:1450:4010:c04::232]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id E8CDF21F2DE; Mon, 1 Sep 2014 14:43:33 -0700 (PDT) Received: by mail-lb0-f178.google.com with SMTP id v6so6496496lbi.37 for ; Mon, 01 Sep 2014 14:43:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=eODwwmQkgWoP0bx3PXz0eGJC+JUcbYfmHi4WVhemKdo=; b=E1CtsJhbHhxEImkjRwoP7MY797yD45AFUD+lhsKXPssu5A40RlN8tuTwphDMXF9Zws 52OfnS1tbfuOor7fwj338AraKK//TEnZ5t81Fggh5yqecKAh0ocAh/4wm37bokO+Aqn5 qUeS2bbZHZgstv8njFiYBpgSJFnbSnyiyspun+EG3IGnIu0davCh6RACee27pwZEdpsQ U3RLjOIyzS0ixBo1OuMfYcrno0DcGdshrjt6UwLkRltkMt0wKyYoTxDYEah4sdsicr5l eJPOfzKuO7FyRrWTaDkXD3fFXN71gWOOfxrMqfD+nuE4o9FJMRNU/K48GMCeNrEu3nXS IHTA== X-Received: by 10.152.43.14 with SMTP id s14mr12958827lal.28.1409607811429; Mon, 01 Sep 2014 14:43:31 -0700 (PDT) Received: from bass.home.chromatix.fi (87-93-123-167.bb.dnainternet.fi. [87.93.123.167]) by mx.google.com with ESMTPSA id kq4sm1275702lac.13.2014.09.01.14.43.29 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 01 Sep 2014 14:43:30 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: Jonathan Morton In-Reply-To: Date: Tue, 2 Sep 2014 00:43:28 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <87ppfijfjc.fsf@toke.dk> <4FF4917C-1B6D-4D5F-81B6-5FC177F12BFC@gmail.com> <4DA71387-6720-4A2F-B462-2E1295604C21@gmail.com> <0DB9E121-7073-4DE9-B7E2-73A41BCBA1D1@gmail.com> To: Aaron Wood X-Mailer: Apple Mail (2.1085) Cc: "cerowrt-devel@lists.bufferbloat.net" , bloat Subject: Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope... X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Sep 2014 21:43:34 -0000 On 1 Sep, 2014, at 11:25 pm, Aaron Wood wrote: >>> But this doesn't really answer the question of why the WNDR has so = much lower a ceiling with shaping than without. The G4 is powerful = enough that the overhead of shaping simply disappears next to the = overhead of shoving data around. Even when I turn up the shaping knob = to a value quite close to the hardware's unshaped capabilities (eg. = 400Mbps one-way), most of the shapers stick to the requested limit like = glue, and even the worst offender is within 10%. I estimate that it's = using only about 500 clocks per packet *unless* it saturates the PCI = bus. >>>=20 >>> It's possible, however, that we're not really looking at a CPU = limitation, but a timer problem. The PowerBook is a "proper" desktop = computer with hardware to match (modulo its age). If all the shapers = now depend on the high-resolution timer, how high-resolution is the = WNDR's timer? >> Both good questions worth further exploration. > Doing some napkin math and some spec reading, I think that the memory = bus is a likely factory. The G4 had a fairly impressive memory bus for = the day (64-bit?). The WNDR3800 appears to be used in an x16 = configuration (based on the numbers on the memory parts). It may have = *just* enough bw to push concurrent 3x3 802.11n through the software = bridge interface, which short-circuits a lot of processing (IIRC). =20 >=20 > The typical way I've seen a home router being benchmarked for the = "marketing numbers" is to flow tcp data to/from a wifi client to a wired = client. Single socket is used, for a uni-directional stream of data. = So long as they can hit peak rates (peak MCS), it will get marked as = good for "up to 900Mbps!!" or whatever they want to say. >=20 > The small cache of the AR7161 vs. the G4 is another issue (32KB vs. = 2MB) the various buffers for fq_codel and htb may stay in L2 on the G4, = but there simply isn't room in the AR7161 for that, which puts further = pressure on the bus. I don't think that's it. First a nitpick: the PowerBook version of the late-model G4 (7447A) = doesn't have the external L3 cache interface, so it only has the 256KB = or 512KB internal L2 cache (I forget which). The desktop version = (7457A) used external cache. The G4 was considered to be *crippled* by = its FSB by the end of its run, since it never adopted high-performance = signalling techniques, nor moved the memory controller on-die; it was = quoted that the G5 (970) could move data using *single-byte* operations = faster than the *peak* throughput of the G4's FSB. The only reason the = G5 never made it into a PowerBook was because it wasn't battery-friendly = in the slightest. But that makes little difference to your argument - compared to a cheap = CPE-class embedded SoC, the PowerBook is eminently desktop-class = hardware, even if it is already a decade old. More compelling is that even at 16-bit width, the WNDR's RAM should have = more bandwidth than my PowerBook's PCI bus. Standard PCI is 33MHz x = 32-bit, and I can push a steady 30MB/sec in both directions = simultaneously, which corresponds in total to about half the PCI bus's = theoretical capacity. (The GEM reports 66MHz capability, but it shares = the bus with an IDE controller which doesn't, so I assume it is stuck at = 33MHz.) A 16-bit RAM should be able to match PCI if it runs at 66MHz, = which is the lower limit of JEDEC standards for SDRAM. The AR7161 datasheet says it has a DDR-capable SDRAM interface, which = implies at least 200MHz unless the integrator was colossally stingy. = Further, a little digging suggests that the memory bus should be 32-bit = wide (hence two 16-bit RAM chips), and that the WNDR runs it at 340MHz, = half the CPU core speed. For an embedded SoC, that's really not too bad = - it should be able to sustain 1GB/sec, in one direction at a time. So that takes care of the argument for simply moving the payload around. = In any case, the WNDR demonstrably *can* cope with the available = bandwidth if the shaping is turned off. For the purposes of shaping, the CPU shouldn't need to touch the = majority of the payload - only the headers, which are relatively small. = The bulk of the payload should DMA from one NIC to RAM, then DMA back = out of RAM to the other NIC. It has to do that anyway to route them, = and without shaping there'd be more of them to handle. The difference = might be in the data structures used by the shaper itself, but I think = those are also reasonably compact. It doesn't even have to touch = userspace, since it's not acting as the endpoint as my PowerBook was = during my tests. And while the MIPS 24K core is old, it's also been die-shrunk over the = intervening years, so it runs a lot faster than it originally did. I = very much doubt that it's as refined as my G4, but it could probably = hold its own relative to a comparable ARM SoC such as the Raspberry Pi. = (Unfortunately, the latter doesn't have the I/O capacity to do = high-speed networking - USB only.) Atheros publicity materials indicate = that they increased the I-cache to 64KB for performance reasons, but saw = no need to increase the D-cache at the same time. Which brings me back to the timers, and other items of black magic. Incidentally, transfer speed benchmarks involving wireless will = certainly be limited by the wireless link. I assume that's not a factor = here. - Jonathan Morton