[Cake] act_connmark + dscp

Kevin Darbyshire-Bryant kevin at darbyshire-bryant.me.uk
Fri Mar 8 06:13:33 EST 2019



> On 7 Mar 2019, at 17:40, Toke Høiland-Jørgensen <toke at redhat.com> wrote:
> 
> Kevin Darbyshire-Bryant <kevin at darbyshire-bryant.me.uk> writes:
> 
>>> On 7 Mar 2019, at 10:10, Toke Høiland-Jørgensen <toke at redhat.com> wrote:
>> 
>> <snipping some context>
>>>>>> 
>>>>>> The valid bit is set when the ‘getdscp’ function has written a DSCP
>>>>>> value into the conntrack (& hence skb) mark. This allows us & other
>>>>>> skb->mark/ct->mark applications (eg iptables, cake qdisc) to know that
>>>>>> a DSCP value has been placed in the field. We cannot simply use a
>>>>>> non-zero DSCP because zero is a valid DSCP.
>>>>> 
>>>>> If someone installs the action, the field is supposedly always copied;
>>>>> so why do we need another flag?
>>>> 
>>>> I’m trying to limit the number of times expensive iptables mangle
>>>> rules have to run.
>>> 
>>> Right, I see your point, but I'm worried that this can risk becoming a
>>> source of hard-to-debug bugs if this bit happens to get set by accident
>>> in other places. So, I would suggest to at least make it optional (and
>>> configurable). So how about the following configuration options:
>> 
>> Phew, I explained myself clearly enough (just) that time :-). There’s
>> a compromise here to be had between setting/using a DSCP for
>> connection(fwmark) vs setting/using a DSCP per packet. Mostly it’s a
>> (improved) performance vs DSCP accuracy compromise and assumes DSCP
>> isn’t going to change after the connection has been established - some
>> people have rules that mangle DSCP based on amount of data
>> transferred. Personal view: anything that permits some sort of
>> classification restoration on ingress has to be an improvement on what
>> we have now.
> 
> So I don't think we in general can assume that all packets of the same
> flow has the same DSCP mark. If we want to enforce this, that is another
> matter entirely.
> 
>>> - fwmark shift (valid values 0-32)
>>> specifies how many bits to left-shift the DSCP values before putting
>>> them into the fwmark (and how many bits to right-shift the value read
>>> from fwmark before writing it to the DSCP field); this could also be
>>> inferred from the mask rather than be a separate option (shift =
>>> lowest set bit of mask)
>> 
>> I think the inferred choice is best. I can see all manner of confusing
>> behaviour with mismatches between mask & shift. Also it’s one less
>> parameter to pass … and in my dream world have to add some of these
>> parameters into cake as well so it can interpret the DSCP containing
>> fwmark as well… the fewer the better :-)
> 
> Yeah, I tend to lean towards that as well. Which could also work for
> cake: simply interpret the FWMARK parameter as a mask, and shift by the
> lowest set bit.
> 
>>> - get_dscp (boolean; cannot be set along with set_dscp)
>>> if set, the DSCP field will be copied to the fwmark field, subject to
>>> mask and shift
>> 
>> Yes
>> 
>>> 
>>> - set_dscp (boolean; cannot be set along with get_dscp)
>>> if set, the value in fwmark will be copied to the DSCP field, subject
>>> to mask and shift
>> 
>> Yes
>> 
>>> 
>>> - state mask (32-bit value; probably needs better name)
>>> if set: the get_dscp action will OR the resulting fwmark before
>>> storing it. the set_dscp value will AND the fwmark with this value
>>> before doing anything, and abort if the result is false.
>> 
>> Nearly: If set; the get_dscp action will AND the fwmark with this
>> value and abort if true. If false it will OR the resulting fwmark
>> before storing it. the set_dscp action will AND the fwmark with this
>> value before doing anything and will abort if result is false.
>> 
>> The ‘get_dscp action AND with fwmark and abort if true’ is the
>> function that allows DSCP values to ‘stick' into a connection, thus
>> obviating the need for iptables rules to mangle the DSCP on every
>> egress packet.
>> 
>> I can see a conflict between people who want the DSCP copied into
>> fwmark no matter what and people such as myself who wish it to only
>> happen if fwmark doesn’t have it set already.
>> 
>> Is the solution to have a ‘get_check_state’ option that enables the
>> ‘get_dscp action and with fwmark ’state mask’ abort if true check?
>> Maybe even simpler to have a ‘get_state mask’ and a ’set_state mask’ -
>> AND the fwmark with the get_state mask, if false copy the dscp, else
>> abort.
> 
> Right, if you want to be able to do all those combinations of things we
> quickly run into combinatorial explosion, or we risk hard-coding a
> policy that some users won't like. However, if we do want to express
> these kinds of things, maybe this is better implemented as additional
> options to the existing iptables matches?
> 
> I.e., we could add
> 
> --set-mark-from-dscp MASK

Yes, that (I think) could work.  I could wrap my own iptables based ‘have I copied the required DSCP to the fwmark’ rules around that by setting my own flag bit in the fwmark.

but..
> 
> to the MARK target, and
> 
> --set-dscp-from-fwmark MASK
> 
> to the DSCP target. Then you can implement whatever policy you want
> using iptables rules... Any reason that wouldn't work?

On its own I don’t think that would work for ingress traffic - iptables happens too late.  So on planet Kevin I still need some sort of flag held in the fwmark that says ‘I hold a DSCP value’ so cake can use it and act_connmarkdscp can (optionally) restore it to the diffserv field.


I suspect we’re going around in circles around what I would like which is “a bit DSCP fuzzy but lighter on CPU ‘cos I don’t have to hit iptables mangle rules as much” v what I think you would like is ’update the fwmark DSCP every time but that also requires iptables to mangle the DSCP for every packet’

I think the options of get_mask & set_mask can accommodate both behaviour choices.


Phew this is hard - come on all you out there… i know you exist!  What am I missing/misunderstanding?  Ryan on this list previously said "Perfect! And to me, this functionality truly is the icing on (the) cake that makes it the complete bufferbloat/QoS system I've been
dreaming of for ingress.”  Thoughts, comments or do Toke & myself make an amusing enough side show? :-)

Kevin


More information about the Cake mailing list