[Cake] act_connmark + dscp
John Sager
john at sager.me.uk
Sat Mar 9 15:21:04 EST 2019
I wonder if you've dismissed eBPF too quickly. Reading around the subject
that's the way the kernel seems to be going for both network actions and
various other purposes. I wonder if passing the info about cake could be
done via eBPF maps. I can't see your original eBPF example at it's
disappeared off github.
John
On 08/03/2019 14:03, Kevin Darbyshire-Bryant wrote:
>
>
> OK, what I am trying to do is classify incoming connections into relevant cake tins to impose some bandwidth fairness. e.g. classify bittorrent & things that are downloads into the Bulk tin, and prioritise stuff that is streaming video into the Video tin. Incoming DSCP has a) been washed and b) is unreliable anyway so is unhelpful in this case. iptables runs too late, so having rules to classify incoming stuff is pointless.
>
> tc filters run early enough to use the tc skbedit major/minor number to influence cake’s tin decisions. But tc filters, a) don’t get to see de-natted ipv4 addresses, b) daisy chain, so all filters must be traversed. I can’t find my original tc filter ‘de-prio bittorrent’ but it was a very simple ‘does this destination port match?, yes skbedit to select bulk tin’ - I wanted to do more but the daisy chaining & lack of de-natting made this technique useless.
>
> Then I recently discovered act_connmark (http://linux-ip.net/gl/tc-filters/tc-filters-node2.html) - the thinking being I could use iptables on egress to set fwmarks to classify a connection and have the ingress packets magically follow. This worked but still required 3 tc filter actions to cope with 4 tins:
>
> $TC filter add dev $IFACE parent $MAJOR protocol ip handle 0x01 fw action skbedit priority ${MAJOR}1
> $TC filter add dev $IFACE parent $MAJOR protocol ip handle 0x03 fw action skbedit priority ${MAJOR}3
> $TC filter add dev $IFACE parent $MAJOR protocol ip handle 0x04 fw action skbedit priority ${MAJOR}4
>
> It also requires similar tc filters on the egress path in addition to the iptables rules.
>
> Could that be improved? Yes, sort of. eBPF to the rescue-ish. I could write an eBPF classifier action program to directly copy the fwmark to the priority field which cake would pick up. I would have stopped there but as I’ve said in a previous email, the eBPF needed to know (hard code) the cake instance major numbers and there was the whole mystery tour of writing/building it.
>
> The other problem with the above magic tin encode into the fwmark routine is that it ignored any good citizens that were using the correct DSCP (e.g. dropbear). I would need to write iptables rules to classify existing DSCP codepoints into the matching tin for fwmark. So ideally I needed the DSCP to drive things and still act as a key into the fwmark mechanism.
>
> The overriding (if required) of DSCP could be done in iptables and to avoid going through the iptables DSCP decision/mangling for every packet I could use a flag within the fwmark to indicate the decision had previously been made and stored for this connection.
>
>
> The current rules are:
>
> # Configure iptables chain to mark packets
> ipt -t mangle -N QOS_MARK_${IFACE}
>
> # Change DSCP of initial relevant hosts/packets - this will be picked up by cake+ and placed in the firewall connmark
> # also the DSCP is used as the tin selector.
>
> iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.5 -m comment --comment "Skybox DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
> iptables -t mangle -A QOS_MARK_${IFACE} -p udp -s 192.168.219.5 -m comment --comment "Skybox DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
> iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.10 -m comment --comment "Bluray DSCP CS2 Video" -j DSCP --set-dscp-class CS2
> iptables -t mangle -A QOS_MARK_${IFACE} -p udp -s 192.168.219.10 -m comment --comment "Bluray DSCP CS2 Video" -j DSCP --set-dscp-class CS2
> iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.12 -m tcp --sport 6981 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
> iptables -t mangle -A QOS_MARK_${IFACE} -p udp -s 192.168.219.12 -m udp --sport 6981 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
> iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.12 -m tcp --dport 4443 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
> iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.12 -m tcp --dport 443 -m comment --comment "HTTPS uploads DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
>
> iptables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Bulk4 dst -j DSCP --set-dscp-class CS1 -m comment --comment "Bulk CS1 ipset"
> iptables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Vid4 dst -j DSCP --set-dscp-class CS2 -m comment --comment "Vid CS2 ipset"
> iptables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Voice4 dst -j DSCP --set-dscp-class CS6 -m comment --comment "Voice CS6 ipset"
>
> ip6tables -t mangle -A QOS_MARK_${IFACE} -p tcp -s ::c/::ffff:ffff:ffff:ffff -m tcp --sport 6981 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
> ip6tables -t mangle -A QOS_MARK_${IFACE} -p udp -s ::c/::ffff:ffff:ffff:ffff -m udp --sport 6981 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
> ip6tables -t mangle -A QOS_MARK_${IFACE} -p tcp -s ::c/::ffff:ffff:ffff:ffff -m tcp --dport 4443 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
> ip6tables -t mangle -A QOS_MARK_${IFACE} -p tcp -s ::c/::ffff:ffff:ffff:ffff -m tcp --dport 443 -m comment --comment "HTTPS uploads DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
>
> ip6tables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Bulk6 dst -j DSCP --set-dscp-class CS1 -m comment --comment "Bulk CS1 ipset"
> ip6tables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Vid6 dst -j DSCP --set-dscp-class CS2 -m comment --comment "Vid CS2 ipset"
> ip6tables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Voice6 dst -j DSCP --set-dscp-class CS6 -m comment --comment "Voice CS6 ipset"
>
> # Send cake+ unmarked connections to the marking chain - Cake+ uses top byte as the
> # i've been marked & here's the dscp placeholder.
> # top 6 bits are DSCP, LSB is DSCP is valid flag
> ipt -t mangle -A PREROUTING -i $IFACE -m mark --mark 0x00/0x01000000 -g QOS_MARK_${IFACE}
> ipt -t mangle -A POSTROUTING -o $IFACE -m mark --mark 0x00/0x01000000 -g QOS_MARK_${IFACE}
>
>
> The initial egress packet for a connection will go through the above chain (--mark 0x00/0x01000000 -g QOS_MARK_${IFACE}) where the DSCP value is change if required.
>
> Cake will see this initial packet, inspect the fwmark, and because it hasn’t been set will both copy the dscp into the mark and set the ‘fwdscp marked’ bit.
>
> Subsequent egress packets will neither go through the iptables DSCP mangle or the cake ‘update the fwmark’ routine. Instead, cake will use the fwmark as the tin selector.
>
>
> The ingress path is different. First off act_connmark restores any connection mark to the packet. Cake will inspect the fwmark for the ‘fwdscp marked’ bit. If it is set, then the dscp coded in the firewall mark is used for tin selection. Optionally the encoded DSCP is restored to the packet’s diffserv, but I personally don’t use that functionality as I’m only interested in ’tin fair’ use of the link. And that’s it.
>
> I’m doing 2 things.
>
> 1) Classifying traffic into tins on ingress based on the egress DSCP values contained in fwmarks.
> 2) Basing the fwmark contained DSCP on the initial packet of the connection, possibly after being modified once by iptables rules.
>
>
>>
>> In particular, requirement 2 is why I'm pushing back against hard-coding
>> a mask anywhere…
>
> I think with ‘fwmark mask’, ‘get_dscp’, ’set_dscp’, ‘get_state mask’, ’set_state mask’ nothing *is* hard coded.
>
>>
>> So could you maybe post your current ruleset and explain what it is you
>> are trying to achieve at a high level, and why? :)
>
> I hope I’ve done that.
>
>>
>> Also, you keep mentioning "must be lighter on CPU". Do you have any
>> performance numbers to show the impact of your current ruleset? Would be
>> easier to assess any performance impact if we have some baseline numbers
>> to compare against…
>
> Let me see if I can quantify that in some way.
>
>>
>> -Toke
>
>
> Cheers,
>
> Kevin D-B
>
> 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A
>
> _______________________________________________
> Cake mailing list
> Cake at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
>
More information about the Cake
mailing list