[Cake] act_connmark + dscp

Kevin Darbyshire-Bryant kevin at darbyshire-bryant.me.uk
Fri Mar 8 09:03:52 EST 2019



> On 8 Mar 2019, at 11:28, Toke Høiland-Jørgensen <toke at redhat.com> wrote:
> 
> Kevin Darbyshire-Bryant <kevin at darbyshire-bryant.me.uk> writes:
> 
>> On its own I don’t think that would work for ingress traffic -
>> iptables happens too late. So on planet Kevin I still need some sort
>> of flag held in the fwmark that says ‘I hold a DSCP value’ so cake can
>> use it and act_connmarkdscp can (optionally) restore it to the
>> diffserv field.
>> 
>> I suspect we’re going around in circles around what I would like which
>> is “a bit DSCP fuzzy but lighter on CPU ‘cos I don’t have to hit
>> iptables mangle rules as much” v what I think you would like is
>> ’update the fwmark DSCP every time but that also requires iptables to
>> mangle the DSCP for every packet’
> 
> Well I think my problem is that I don't really have a use case for this
> myself. So I need to understand your use case better in order to have an
> opinion on how best to implement it so that:
> 
> 1. We can accommodate what you are trying to do
> 
> and
> 
> 2. We can also accommodate other related use cases, and we don't set
>   policy in the kernel.


OK, what I am trying to do is classify incoming connections into relevant cake tins to impose some bandwidth fairness.  e.g. classify bittorrent & things that are downloads into the Bulk tin, and prioritise stuff that is streaming video into the Video tin. Incoming DSCP has a) been washed and b) is unreliable anyway so is unhelpful in this case.  iptables runs too late, so having rules to classify incoming stuff is pointless.

tc filters run early enough to use the tc skbedit major/minor number to influence cake’s tin decisions.  But tc filters, a) don’t get to see de-natted ipv4 addresses, b) daisy chain, so all filters must be traversed.  I can’t find my original tc filter ‘de-prio bittorrent’ but it was a very simple ‘does this destination port match?, yes skbedit to select bulk tin’ - I wanted to do more but the daisy chaining & lack of de-natting made this technique useless.

Then I recently discovered act_connmark (http://linux-ip.net/gl/tc-filters/tc-filters-node2.html) - the thinking being I could use iptables on egress to set fwmarks to classify a connection and have the ingress packets magically follow.  This worked but still required 3 tc filter actions to cope with 4 tins:

$TC filter add dev $IFACE parent $MAJOR protocol ip handle 0x01 fw action skbedit priority ${MAJOR}1
$TC filter add dev $IFACE parent $MAJOR protocol ip handle 0x03 fw action skbedit priority ${MAJOR}3
$TC filter add dev $IFACE parent $MAJOR protocol ip handle 0x04 fw action skbedit priority ${MAJOR}4

It also requires similar tc filters on the egress path in addition to the iptables rules.

Could that be improved?  Yes, sort of. eBPF to the rescue-ish.  I could write an eBPF classifier action program to directly copy the fwmark to the priority field which cake would pick up.  I would have stopped there but as I’ve said in a previous email, the eBPF needed to know (hard code) the cake instance major numbers and there was the whole mystery tour of writing/building it.

The other problem with the above magic tin encode into the fwmark routine is that it ignored any good citizens that were using the correct DSCP (e.g. dropbear). I would need to write iptables rules to classify existing DSCP codepoints into the matching tin for fwmark.  So ideally I needed the DSCP to drive things and still act as a key into the fwmark mechanism.

The overriding (if required) of DSCP could be done in iptables and to avoid going through the iptables DSCP decision/mangling for every packet I could use a flag within the fwmark to indicate the decision had previously been made and stored for this connection.


The current rules are:

    # Configure iptables chain to mark packets
    ipt -t mangle -N QOS_MARK_${IFACE}

    # Change DSCP of initial relevant hosts/packets - this will be picked up by cake+ and placed in the firewall connmark
    # also the DSCP is used as the tin selector.

iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.5 -m comment --comment "Skybox DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
iptables -t mangle -A QOS_MARK_${IFACE} -p udp -s 192.168.219.5 -m comment --comment "Skybox DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.10 -m comment --comment "Bluray DSCP CS2 Video" -j DSCP --set-dscp-class CS2
iptables -t mangle -A QOS_MARK_${IFACE} -p udp -s 192.168.219.10 -m comment --comment "Bluray DSCP CS2 Video" -j DSCP --set-dscp-class CS2
iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.12 -m tcp --sport 6981 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
iptables -t mangle -A QOS_MARK_${IFACE} -p udp -s 192.168.219.12 -m udp --sport 6981 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.12 -m tcp --dport 4443 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.12 -m tcp --dport 443 -m comment --comment "HTTPS uploads DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1

iptables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Bulk4  dst -j DSCP --set-dscp-class CS1 -m comment --comment "Bulk CS1 ipset"
iptables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Vid4   dst -j DSCP --set-dscp-class CS2 -m comment --comment "Vid CS2 ipset"
iptables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Voice4 dst -j DSCP --set-dscp-class CS6 -m comment --comment "Voice CS6 ipset"

ip6tables -t mangle -A QOS_MARK_${IFACE} -p tcp -s ::c/::ffff:ffff:ffff:ffff -m tcp --sport 6981 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
ip6tables -t mangle -A QOS_MARK_${IFACE} -p udp -s ::c/::ffff:ffff:ffff:ffff -m udp --sport 6981 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
ip6tables -t mangle -A QOS_MARK_${IFACE} -p tcp -s ::c/::ffff:ffff:ffff:ffff -m tcp --dport 4443 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
ip6tables -t mangle -A QOS_MARK_${IFACE} -p tcp -s ::c/::ffff:ffff:ffff:ffff -m tcp --dport 443 -m comment --comment "HTTPS uploads DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1

ip6tables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Bulk6  dst -j DSCP --set-dscp-class CS1 -m comment --comment "Bulk CS1 ipset"
ip6tables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Vid6 dst -j DSCP --set-dscp-class CS2 -m comment --comment "Vid CS2 ipset"
ip6tables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Voice6 dst -j DSCP --set-dscp-class CS6 -m comment --comment "Voice CS6 ipset"

    # Send cake+ unmarked connections to the marking chain - Cake+ uses top byte as the
    # i've been marked & here's the dscp placeholder. 
    # top 6 bits are DSCP, LSB is DSCP is valid flag
    ipt -t mangle -A PREROUTING  -i $IFACE -m mark --mark 0x00/0x01000000 -g QOS_MARK_${IFACE}
    ipt -t mangle -A POSTROUTING -o $IFACE -m mark --mark 0x00/0x01000000 -g QOS_MARK_${IFACE}


The initial egress packet for a connection will go through the above chain (--mark 0x00/0x01000000 -g QOS_MARK_${IFACE}) where the DSCP value is change if required.

Cake will see this initial packet, inspect the fwmark, and because it hasn’t been set will both copy the dscp into the mark and set the ‘fwdscp marked’ bit.

Subsequent egress packets will neither go through the iptables DSCP mangle or the cake ‘update the fwmark’ routine.  Instead, cake will use the fwmark as the tin selector.


The ingress path is different.  First off act_connmark restores any connection mark to the packet.  Cake will inspect the fwmark for the ‘fwdscp marked’ bit.  If it is set, then the dscp coded in the firewall mark is used for tin selection.  Optionally the encoded DSCP is restored to the packet’s diffserv, but I personally don’t use that functionality as I’m only interested in ’tin fair’ use of the link.  And that’s it.

I’m doing 2 things.

1) Classifying traffic into tins on ingress based on the egress DSCP values contained in fwmarks.
2) Basing the fwmark contained DSCP on the initial packet of the connection, possibly after being modified once by iptables rules.


> 
> In particular, requirement 2 is why I'm pushing back against hard-coding
> a mask anywhere…

I think with ‘fwmark mask’, ‘get_dscp’, ’set_dscp’, ‘get_state mask’, ’set_state mask’ nothing *is* hard coded.

> 
> So could you maybe post your current ruleset and explain what it is you
> are trying to achieve at a high level, and why? :)

I hope I’ve done that.

> 
> Also, you keep mentioning "must be lighter on CPU". Do you have any
> performance numbers to show the impact of your current ruleset? Would be
> easier to assess any performance impact if we have some baseline numbers
> to compare against…

Let me see if I can quantify that in some way.

> 
> -Toke


Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A



More information about the Cake mailing list