[Cake] Using firewall connmarks as tin selectors

Kevin Darbyshire-Bryant kevin at darbyshire-bryant.me.uk
Mon Mar 4 10:50:42 EST 2019



> On 4 Mar 2019, at 12:44, Toke Høiland-Jørgensen <toke at redhat.com> wrote:
> 
> Kevin Darbyshire-Bryant <kevin at darbyshire-bryant.me.uk> writes:
> 
>>> On 4 Mar 2019, at 11:17, Toke Høiland-Jørgensen <toke at redhat.com> wrote:
>>> 
>>> Kevin Darbyshire-Bryant <kevin at darbyshire-bryant.me.uk> writes:
>>> 
>>>>> On 4 Mar 2019, at 08:39, Pete Heist <pete at heistp.net> wrote:
>>>>> 
>>>>> 
>>>>>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant <kevin at darbyshire-bryant.me.uk> wrote:
>>>>>> 
>>>>>> The very bad idea:
>>>>>> 
>>>>>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark
>>>>>> implementation as described above. So an awful lot of our
>>>>>> shenanigans above is due to DSCP not traversing the internet
>>>>>> particularly well. The solution above abstracts DSCP into ’tins’
>>>>>> which we put into fwmarks. Another approach would be to put the DSCP
>>>>>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained
>>>>>> DSCP into the diffserv field onto the actual packets. Voila DSCP
>>>>>> traversal across ’tinternet with tin/bandwidth allocation in our
>>>>>> local domain preserved.
>>>>> 
>>>>> If I understand it right, another use case for this “very bad idea”
>>>>> is preserving DSCP locally while traversing upstream WiFi links as
>>>>> besteffort, which avoids airtime efficiency problems that can occur
>>>>> with 802.11e (WMM). In cases where the router config can’t be changed
>>>>> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as
>>>>> it hides DSCP from the WiFi stack while preserving the values through
>>>>> the tunnel, but this would be easier. Neat… :)
>>>> 
>>>> Everyone has understood the intent & maybe the implementation
>>>> correctly. 2 patches attached, one for cake, one for tc.
>>>> 
>>>> They are naively coded and some of it undoes Toke’s recent tidying up
>>>> (sorry!)
>>> 
>>> Heh. First comment: Don't do that ;)
>> 
>> I did say naively coded.
>> 
>>> 
>>> A few more below.
>>> 
>>>> 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A
>>>> diff --git a/pkt_sched.h b/pkt_sched.h
>>>> index a2f570c..d1f288d 100644
>>>> --- a/pkt_sched.h
>>>> +++ b/pkt_sched.h
>>>> @@ -879,6 +879,7 @@ enum {
>>>> 	TCA_CAKE_ACK_FILTER,
>>>> 	TCA_CAKE_SPLIT_GSO,
>>>> 	TCA_CAKE_FWMARK,
>>>> +	TCA_CAKE_ICING,
>>>> 	__TCA_CAKE_MAX
>>>> };
>>>> #define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
>>>> diff --git a/sch_cake.c b/sch_cake.c
>>>> index 733b897..5aca0f3 100644
>>>> --- a/sch_cake.c
>>>> +++ b/sch_cake.c
>>>> @@ -270,7 +270,8 @@ enum {
>>>> 	CAKE_FLAG_INGRESS	   = BIT(2),
>>>> 	CAKE_FLAG_WASH		   = BIT(3),
>>>> 	CAKE_FLAG_SPLIT_GSO	   = BIT(4),
>>>> -	CAKE_FLAG_FWMARK	   = BIT(5)
>>>> +	CAKE_FLAG_FWMARK	   = BIT(5),
>>>> +	CAKE_FLAG_ICING		   = BIT(6)
>>> 
>>> This implies that icing and fwmark can be enabled completely
>>> independently of each other. Are you sure about the semantics for that?
>> 
>> No, I’m not.  I sent the patches so others could see my lunacy in action and hopefully improve it.
>> 
>> The actual operation links FWMARK, INGRESS & ICING in a variety of
>> combinations.
> 
> Right, so obviously this needs to be thought through…

Yes.  Volunteers with thoughts & ideally code sought.

> 
>>>> };
>>>> 
>>>> /* COBALT operates the Codel and BLUE algorithms in parallel, in order to
>>>> @@ -333,7 +334,7 @@ static const u8 diffserv8[] = {
>>>> };
>>>> 
>>>> static const u8 diffserv4[] = {
>>>> -	0, 2, 0, 0, 2, 0, 0, 0,
>>>> +	0, 1, 0, 0, 2, 0, 0, 0,
>>>> 	1, 0, 0, 0, 0, 0, 0, 0,
>>>> 	2, 0, 2, 0, 2, 0, 2, 0,
>>>> 	2, 0, 2, 0, 2, 0, 2, 0,
>>>> @@ -344,7 +345,7 @@ static const u8 diffserv4[] = {
>>>> };
>>>> 
>>>> static const u8 diffserv3[] = {
>>>> -	0, 0, 0, 0, 2, 0, 0, 0,
>>>> +	0, 1, 0, 0, 2, 0, 0, 0,
>>> 
>>> Why are you messing with the diffserv mappings in this patch?
>> 
>> This is a combination patch of Dave’s new LE coding and the
>> fwmark/dscp mangling.
> 
> Ah. Well let's keep that separate from this patch/discussion…

Done
> 
>>> 
>>>> 	1, 0, 0, 0, 0, 0, 0, 0,
>>>> 	0, 0, 0, 0, 0, 0, 0, 0,
>>>> 	0, 0, 0, 0, 0, 0, 0, 0,
>>>> @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
>>>> 	return idx + (tin << 16);
>>>> }
>>>> 
>>>> -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>>>> +void cake_update_diffserv(struct sk_buff *skb, u8 dscp)
>>>> +{
>>>> +	switch (skb->protocol) {
>>>> +	case htons(ETH_P_IP):
>>>> +		if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp)
>>>> +			ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp);
>>>> +		break;
>>>> +	case htons(ETH_P_IPV6):
>>>> +		if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp)
>>>> +			ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp);
>>>> +		break;
>>>> +	default:
>>>> +		break;
>>>> +	}
>>>> +
>>>> +}
>>> 
>>> So washing is just a special case of this (wash is
>>> cake_update_diffserv(skb,0)). So you shouldn't need to add another
>>> function, just augment the existing handling code.
>> 
>> Erm, that’s exactly what I’ve done.
> 
> Ah, right; I guess it's just the reverting of the cleanup patch that is
> the issue, then :)

With the reverting of the cleaning reverted - if that makes any sense.
And I have major twitchiness at the result.

> 
>>>> +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
>>>> {
>>>> 	u8 dscp;
>>>> 
>>>> @@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>>>> 	}
>>>> }
>>>> 
>>>> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
>>> 
>>> Save an ifdef below by moving the ifdef inside the function definition.
>>> 
>>>> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
>>>> +{
>>>> +	enum ip_conntrack_info ctinfo;
>>>> +	struct nf_conn *ct;
>>>> +
>>>> +	ct = nf_ct_get(skb, &ctinfo);
>>>> +	if (!ct)
>>>> +		return;
>>>> +
>>>> +	ct->mark &= 0x80ffffff;
>>>> +	ct->mark |= (0x40 | dscp) << 24;
>>> 
>>> Right, so we *might* have an argument that putting the *tin* into the
>>> fwmark is CAKE's business, but copying over the dscp mark is not
>>> something a qdisc should be doing…
>> 
>> Why ever not?  It’s not the DSCP, it’s a lookup value into the cake
>> priority table, it just happens to look like the DSCP ;-)
> 
> If it quacks like a duck…

I honestly don’t know where to go from here.  I’m clearly trying to do something that the kernel doesn’t want to do.

> 
>> 
>>>> +	nf_conntrack_event_cache(IPCT_MARK, ct);
>>>> +}
>>>> +#endif
>>> 
>>> Also, are you sure this will work in all permutations of conntrack being
>>> a module vs not etc? (we had to jump through quite some hoops to get the
>>> conntrack hooks to work last time; this is probably my biggest worry here).
>> 
>> No, I have absolutely no clue here at all.
> 
> Well, another issue that needs fixing, then…

Again bright-eyed & bushy tailed volunteers sought!

> 
> -Toke

v2 addressing some of the comments attached.  Is it best to keep the in progress patches here or should they be github PRs ?


Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v2-0001-Copy-dscp-to-fwmark.patch
Type: application/octet-stream
Size: 4877 bytes
Desc: v2-0001-Copy-dscp-to-fwmark.patch
URL: <https://lists.bufferbloat.net/pipermail/cake/attachments/20190304/40e6b8dc/attachment-0001.obj>


More information about the Cake mailing list