[Cake] openwrt build with latest cake and other qdiscs

Thu May 14 06:58:29 EDT 2015

Hi Jonathan,

On May 14, 2015, at 12:24 , Jonathan Morton <chromatix99 at gmail.com> wrote:

>>> I’ve just pushed support for an overhead parameter; both cake itself and the iproute2 module.  I took the opportunity to put in a minor optimisation for the cell-framing compensation as well.
>> 
>> 	Great, thanks a lot. I have a question though: http://lxr.free-electrons.com/ident?i=psched_l2t_ns basically does the same operation, but slightly different:
>>              DIV_ROUND_UP instead of do_div((n+d-1), d)
>> What is the kernel policy here, reuse specialized macros or rather code more readable (with slight redundancy)?
> 
> It looks as though the DIV_ROUND_UP() expands to exactly the same code, except that a plain division is used instead of do_div().  The latter includes a conversion to multiplying by the inverse on ARM, when the divisor is a constant (which it is), since ARM doesn’t have a hardware integer divide.  (AArch64 does.)
> 
> With that said, I haven’t closely examined the resulting assembler.

	I just noticed the difference and thought I’d bring it up, so I can understand the code better, that’s all.

> 
> I’m also not going to use psched_l2t_ns(), because I use the corrected packet length for other purposes than just time.

	Sure, HTB does its accounting in a weird way, and the different rate tables plainly confuse me. I was just referencing thgis code for the do_div vs DIV_ROUND_UP question.

>  It also fails to support negative overheads, which can occasionally occur when using IPoA.

	I know, that is why we default to “stab” in sqm scripts… and as far as I can see Alan tested whether stab works with cake and it seems it does. Still it is much better if cake controls both overhead and encapsulation, since stab’s encapsulation handling is not optimal.

> 
>> It seems clear that cake does fully rely on the supplied overhead, unlike htb which will automatically add ethernet overhead and an estimate? of the additional header GRO packets drag in, see:
>> http://lxr.free-electrons.com/source/net/core/dev.c#L2744
> 
> I can’t figure out the connection between HTB and that code.  

	Well, this function is called by __dev_xmit_skb (see http://lxr.free-electrons.com/source/net/core/dev.c#L2774 ) so it is not HTB specific, that is everyone looking in qdisc_skb_cb(skb)->pkt_len for the size seems to get that adjustment, only the following call to  qdisc_calculate_pkt_len(skb, q); unfortunately overrides skb->pkt_len with skb->len+overhead, but everybody else using pkt_len should get this size correction, I believe.

> Also, that appears to be GSO, not GRO.

	My bad, I was using GRO just as a moniker for packet aggregate processing in the network stack, without even thinking through the details.

>  I’m not precisely sure what the difference is, but I’d hazard a guess that GSO is outbound, GRO is inbound.

	No idea.

> 
> Frankly, I hate having to deal with packet aggregates in the core network stack.  

	But that ship has sailed, I fear, at high speeds the network stack profits noticeably by not going through the motions for each packet sequentially, but basically treating a batch of packets as one that the NIC will then segment out, so I have my doubts whether this is going away any time soon.

> Device drivers can aggregate if that makes sense for the hardware, but I’d much rather that was kept out of my qdisc.  Peeling is on the agenda; that’ll make sure we are dealing with actual, individual packets when we need to.  

	I agree, that sounds conceptually much cleaner, but peeling is going to be costlier than pushing the segmentation to the NIC, so bandwidth aficionados will not appreciate unconditional peeling, I would guess.

> Certainly when dealing with cell-framing overhead, we *always* need to know individual packet sizes.

	Well that or the sum for an aggregate as long as the sum takes all fancy “celling” into account, all we really need to know to how many bits the data expands on the wire.

> 
>> I actually like that cake does not try to auto-adjust the overhead by itself, since the kernel does this automatically for an ethernet link, but not say for a PPPoE interface, making it a bit tricky to recommend the proper encapsulation to ATM users, “use 40 if you shape on the pppoe-wan interface but 26 if you shape on the wan interface directly is a sure way to confuse people”.
> 
> I consider that a user-interface problem, as well as reflecting a general problem with PPPoE.  Actually, PPPoE has *never* been user-friendly; it outright sucks in all respects.  I can’t think of a single reason to use PPPoE instead of PPPoA.  AFAIK, all Finnish and most British DSL ISPs use either PPPoA or bridging; I’ve only personally encountered PPPoE in the US.

	Again, I agree, but say in Germany all big ISPs use PPPoE, even over fiber, so this is going to stay with us a bit longer. Since ATM is going to go the way of the dodo fast, PPPoA will not be an option for much longer, so dhcp would be nice to have (it is not like the ISP does not know which line it services anyway, so the billing and identification issue that is often brought up is a bit of a straw man; I believe they just stick to it because their billing back-end already knows how to handle this).

> 
> To help reduce confusion, it would probably be best to offer consistent advice on which interface to shape and how much overhead to account for there.  I think shaping the traffic that actually goes over the link is more correct than shaping the traffic that goes to the modem (which might include some management traffic that doesn’t go on the wire).  So you should shape on the PPPoE interface and add the full 40 bytes there.

	Well almost, this depends whether there is a BRAS throttle or not, the pppoe interface does not see or account for the PPPoE management packets, that without BRAS throttling will also eat up bits on the DSL link. I admit those packets are rare, but still… There should be no other important traffic to the modem heavy enough to be noticeable to the user. That said, I currently shape on pppoe-ge00, and it works well enough, I guess the PPPoE traffic simply squeezes into the small %age the shaper is reduced from line/throttled rate.

>  Happily, this advice is also safe if the user accidentally selects the wrong interface, since 40 bytes is conservative for the Ethernet interface.

	As seen from our latency focussed vantage point ;)

> 
> Anyway, user-interface problems are best solved in userspace.  Cake’s internal implementation is thus kept simple and numerical.  The tc module now supports that directly, and more user-friendly support can be added either there or in external scripts, or some combination of the two.

	Okay, sounds like a good division of labor between the kernel and tc ;)

Best Regards
	Sebastian

> - Jonathan Morton
>