From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.15.19]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-1" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 3FA4521F3D5 for ; Thu, 14 May 2015 03:58:34 -0700 (PDT) Received: from hms-beagle-5.lan ([134.2.89.70]) by mail.gmx.com (mrgmx003) with ESMTPSA (Nemesis) id 0MIuSH-1YvJgk17f7-002VlF; Thu, 14 May 2015 12:58:31 +0200 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) From: Sebastian Moeller In-Reply-To: Date: Thu, 14 May 2015 12:58:29 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <554F64E1.6000609@gmail.com> <554F9594.60808@gmail.com> <50DB1E31-61AE-4298-B80F-8C6F7487C99B@gmail.com> <002A5BFC-5511-4995-8785-370251F24083@gmx.de> To: Jonathan Morton X-Mailer: Apple Mail (2.1878.6) X-Provags-ID: V03:K0:uNYzhPkd1Yyd4Pe5FvJ7GzSGGSgWonzqgZeFQh7AZqW7YznDO+k bRWARW1Ft2Hio6oW64bc02nF/3NvGHfkpFwYWn5P0VBgLqQ5wpyYxluNFY7/fwE87cLkMw1 kyPNCmmEzGmJJ8wk9DD3OSUYsPv8RnT5imyrGlwi9aFsnLDHlkftjgSXqXpeJR1H98d4lab TCeX5pmDgZTDXEFDWFp0w== X-UI-Out-Filterresults: notjunk:1; Cc: cake@lists.bufferbloat.net Subject: Re: [Cake] openwrt build with latest cake and other qdiscs X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 May 2015 10:59:03 -0000 Hi Jonathan, On May 14, 2015, at 12:24 , Jonathan Morton = wrote: >>> I=92ve just pushed support for an overhead parameter; both cake = itself and the iproute2 module. I took the opportunity to put in a = minor optimisation for the cell-framing compensation as well. >>=20 >> Great, thanks a lot. I have a question though: = http://lxr.free-electrons.com/ident?i=3Dpsched_l2t_ns basically does the = same operation, but slightly different: >> DIV_ROUND_UP instead of do_div((n+d-1), d) >> What is the kernel policy here, reuse specialized macros or rather = code more readable (with slight redundancy)? >=20 > It looks as though the DIV_ROUND_UP() expands to exactly the same = code, except that a plain division is used instead of do_div(). The = latter includes a conversion to multiplying by the inverse on ARM, when = the divisor is a constant (which it is), since ARM doesn=92t have a = hardware integer divide. (AArch64 does.) >=20 > With that said, I haven=92t closely examined the resulting assembler. I just noticed the difference and thought I=92d bring it up, so = I can understand the code better, that=92s all. >=20 > I=92m also not going to use psched_l2t_ns(), because I use the = corrected packet length for other purposes than just time. Sure, HTB does its accounting in a weird way, and the different = rate tables plainly confuse me. I was just referencing thgis code for = the do_div vs DIV_ROUND_UP question. > It also fails to support negative overheads, which can occasionally = occur when using IPoA. I know, that is why we default to =93stab=94 in sqm scripts=85 = and as far as I can see Alan tested whether stab works with cake and it = seems it does. Still it is much better if cake controls both overhead = and encapsulation, since stab=92s encapsulation handling is not optimal. >=20 >> It seems clear that cake does fully rely on the supplied overhead, = unlike htb which will automatically add ethernet overhead and an = estimate? of the additional header GRO packets drag in, see: >> http://lxr.free-electrons.com/source/net/core/dev.c#L2744 >=20 > I can=92t figure out the connection between HTB and that code. =20 Well, this function is called by __dev_xmit_skb (see = http://lxr.free-electrons.com/source/net/core/dev.c#L2774 ) so it is not = HTB specific, that is everyone looking in qdisc_skb_cb(skb)->pkt_len for = the size seems to get that adjustment, only the following call to = qdisc_calculate_pkt_len(skb, q); unfortunately overrides skb->pkt_len = with skb->len+overhead, but everybody else using pkt_len should get this = size correction, I believe. > Also, that appears to be GSO, not GRO. My bad, I was using GRO just as a moniker for packet aggregate = processing in the network stack, without even thinking through the = details. > I=92m not precisely sure what the difference is, but I=92d hazard a = guess that GSO is outbound, GRO is inbound. No idea. >=20 > Frankly, I hate having to deal with packet aggregates in the core = network stack. =20 But that ship has sailed, I fear, at high speeds the network = stack profits noticeably by not going through the motions for each = packet sequentially, but basically treating a batch of packets as one = that the NIC will then segment out, so I have my doubts whether this is = going away any time soon. > Device drivers can aggregate if that makes sense for the hardware, but = I=92d much rather that was kept out of my qdisc. Peeling is on the = agenda; that=92ll make sure we are dealing with actual, individual = packets when we need to. =20 I agree, that sounds conceptually much cleaner, but peeling is = going to be costlier than pushing the segmentation to the NIC, so = bandwidth aficionados will not appreciate unconditional peeling, I would = guess. > Certainly when dealing with cell-framing overhead, we *always* need to = know individual packet sizes. Well that or the sum for an aggregate as long as the sum takes = all fancy =93celling=94 into account, all we really need to know to how = many bits the data expands on the wire. >=20 >> I actually like that cake does not try to auto-adjust the overhead by = itself, since the kernel does this automatically for an ethernet link, = but not say for a PPPoE interface, making it a bit tricky to recommend = the proper encapsulation to ATM users, =93use 40 if you shape on the = pppoe-wan interface but 26 if you shape on the wan interface directly is = a sure way to confuse people=94. >=20 > I consider that a user-interface problem, as well as reflecting a = general problem with PPPoE. Actually, PPPoE has *never* been = user-friendly; it outright sucks in all respects. I can=92t think of a = single reason to use PPPoE instead of PPPoA. AFAIK, all Finnish and = most British DSL ISPs use either PPPoA or bridging; I=92ve only = personally encountered PPPoE in the US. Again, I agree, but say in Germany all big ISPs use PPPoE, even = over fiber, so this is going to stay with us a bit longer. Since ATM is = going to go the way of the dodo fast, PPPoA will not be an option for = much longer, so dhcp would be nice to have (it is not like the ISP does = not know which line it services anyway, so the billing and = identification issue that is often brought up is a bit of a straw man; I = believe they just stick to it because their billing back-end already = knows how to handle this). >=20 > To help reduce confusion, it would probably be best to offer = consistent advice on which interface to shape and how much overhead to = account for there. I think shaping the traffic that actually goes over = the link is more correct than shaping the traffic that goes to the modem = (which might include some management traffic that doesn=92t go on the = wire). So you should shape on the PPPoE interface and add the full 40 = bytes there. Well almost, this depends whether there is a BRAS throttle or = not, the pppoe interface does not see or account for the PPPoE = management packets, that without BRAS throttling will also eat up bits = on the DSL link. I admit those packets are rare, but still=85 There = should be no other important traffic to the modem heavy enough to be = noticeable to the user. That said, I currently shape on pppoe-ge00, and = it works well enough, I guess the PPPoE traffic simply squeezes into the = small %age the shaper is reduced from line/throttled rate. > Happily, this advice is also safe if the user accidentally selects = the wrong interface, since 40 bytes is conservative for the Ethernet = interface. As seen from our latency focussed vantage point ;) >=20 > Anyway, user-interface problems are best solved in userspace. Cake=92s = internal implementation is thus kept simple and numerical. The tc = module now supports that directly, and more user-friendly support can be = added either there or in external scripts, or some combination of the = two. Okay, sounds like a good division of labor between the kernel = and tc ;) Best Regards Sebastian > - Jonathan Morton >=20