[Cake] act_connmark + dscp

Toke Høiland-Jørgensen toke at redhat.com
Sun Mar 10 19:56:51 EDT 2019


Kevin Darbyshire-Bryant <kevin at darbyshire-bryant.me.uk> writes:

>> On 9 Mar 2019, at 14:08, Toke Høiland-Jørgensen <toke at redhat.com> wrote:
>> 
>> Kevin Darbyshire-Bryant <kevin at darbyshire-bryant.me.uk> writes:
>> 
>>> OK, what I am trying to do is classify incoming connections into
>>> relevant cake tins to impose some bandwidth fairness.  e.g. classify
>>> bittorrent & things that are downloads into the Bulk tin, and
>>> prioritise stuff that is streaming video into the Video tin. Incoming
>>> DSCP has a) been washed and b) is unreliable anyway so is unhelpful in
>>> this case.  iptables runs too late, so having rules to classify
>>> incoming stuff is pointless.
>> 
>> Right, I see.
>> 
>> [... snip .. ]
>> 
>>> Then I recently discovered act_connmark
>>> (http://linux-ip.net/gl/tc-filters/tc-filters-node2.html) - the
>>> thinking being I could use iptables on egress to set fwmarks to
>>> classify a connection and have the ingress packets magically follow.
>>> This worked but still required 3 tc filter actions to cope with 4
>>> tins:
>>> 
>>> $TC filter add dev $IFACE parent $MAJOR protocol ip handle 0x01 fw action skbedit priority ${MAJOR}1
>>> $TC filter add dev $IFACE parent $MAJOR protocol ip handle 0x03 fw action skbedit priority ${MAJOR}3
>>> $TC filter add dev $IFACE parent $MAJOR protocol ip handle 0x04 fw action skbedit priority ${MAJOR}4
>> 
>> Right, so this can be replaced with the fwmark action we already added
>> (and that I just pushed an update to so it supports masking the value
>> before selecting a tin).
>
> Yes.  I’d point out the (hopefully obvious) that the flag mask needs
> to be one bit bigger than you might immediately think.  e.g. diffserv4
> needs to store 5 values (0-4), 3 bits. 0 is being used as an implied
> ’tin is not set, fall back to DSCP’.  One could store DSCP+1 of course
> and use the same logic.

Yeah, or one can just squat on a whole byte like I do in the example ;)

>> 
>>> The overriding (if required) of DSCP could be done in iptables and to
>>> avoid going through the iptables DSCP decision/mangling for every
>>> packet I could use a flag within the fwmark to indicate the decision
>>> had previously been made and stored for this connection.
>> 
>> [ ... ]
>> 
>>> I’m doing 2 things.
>>> 
>>> 1) Classifying traffic into tins on ingress based on the egress DSCP
>>> values contained in fwmarks.
>>> 
>>> 2) Basing the fwmark contained DSCP on the initial packet of the
>>> connection, possibly after being modified once by iptables rules.
>> 
>> So I tried prototyping what it would actually look like to do all this
>> in iptables. The result is below (in iptables-restore format). I haven't
>> tested it, but I believe something along the lines of this will work,
>> when used along with the CAKE fwmark support (setting a mask of 0xFF
>> when configuring CAKE).
>> 
>> Now, the obvious eyesore on this is the need to replicate CAKEs diffserv
>> mappings in iptables rules (21 rules in this case, for the diffserv4
>> mapping). As long as this only runs once per connection I don't actually
>> think it's much of a performance issue for normal use, but obviously
>> there could be pathological cases, and it's also annoying to have to do
>> that.
>> 
>> So, first question becomes: Do you agree that the firewall rules below
>> would solve your use case (ignoring the ugliness of having to replicate
>> the diffserv parsing in iptables)? Or am I missing something?
>
> I’ve had a quick look over it and think it would work.
>
> The ugliness of doing the diffserv parsing is what prompted the idea
> of storing the DSCP directly and I felt the stored tin selection was
> effectively abstracting the diffserv field anyway.

Right, but that means that the CAKE interpretation of the fwmark would
have to change from something that selects the tin, to something that is
treated as a DSCP mark. I think this was the part that I was missing
before. I don't think this is a good idea, as that means we tie the
marks to one particular use case.

> Storing the DSCP is more compatible with differing egress v ingress
> mappings (eg. egress diffserv4, ingress diffserv3 though I can’t
> really think of a use case for that)

I think that if someone wants to do something like that, we are way out
of "simple use case that we want to actively support" territory, and can
legitimately ask people to go write a BPF filter or something :)

> Of course using fwmark as tin number selector in cake doesn’t preclude
> some other mechanism of storing & recovering DSCP to/from firewall
> mark e.g. the previously discussed act-connmark+dscp which would help
> anyone who wanted to do such ‘link traversing’ DSCP shenanigans.  That
> of course makes you happier since cake doesn’t embed itself further
> into conntrack.

Yeah, I definitely don't think CAKE has any business writing DSCP values
into the mark. However, as I said before, there may be a case for adding
an option to write the tin selection back to conntrack. Something like
the patch below would do it (with an option to control it, of course),
but it does incur a dependency on another conntrack header, so I'm not
sure if it will be acceptable to upstream. Also, we would need to figure
out how the option should work.

The alternative would be to use another mechanism; the iptables rules
replication is one example. Another could be adding a conntrack helper
to eBPF...

-Toke


diff --git a/sch_cake.c b/sch_cake.c
index a8fa224..c6b7dd9 100644
--- a/sch_cake.c
+++ b/sch_cake.c
@@ -78,6 +78,7 @@
 
 #if IS_REACHABLE(CONFIG_NF_CONNTRACK)
 #include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_ecache.h>
 #include <net/netfilter/nf_conntrack_zones.h>
 #include <net/netfilter/nf_conntrack.h>
 #endif
@@ -1646,6 +1647,27 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
 	}
 }
 
+static void cake_set_tin_connmark(struct cake_sched_data *q,
+				  struct sk_buff *skb, u32 tin)
+{
+#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
+	u32 newmark;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (ct) {
+		newmark = (ct->mark & ~q->fwmark_mask);
+		newmark ^= (tin << q->fwmark_shft) & q->fwmark_mask;
+
+		if (ct->mark != newmark) {
+			ct->mark = newmark;
+			nf_conntrack_event_cache(IPCT_MARK, ct);
+		}
+	}
+#endif
+}
+
 static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
 					     struct sk_buff *skb)
 {
@@ -1678,6 +1700,8 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
 			tin = 0;
 	}
 
+	cake_set_tin_connmark(q, skb, tin);
+
 	return &q->tins[tin];
 }
 


More information about the Cake mailing list