From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-f66.google.com (mail-ed1-f66.google.com [209.85.208.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id E30123B29E for ; Sun, 10 Mar 2019 19:56:55 -0400 (EDT) Received: by mail-ed1-f66.google.com with SMTP id a16so2549198edn.1 for ; Sun, 10 Mar 2019 16:56:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=UlQGu4wONbd2NbY1pPE2xFNguuJG+mYKMtiP3Epu0dI=; b=emAFG4ac+fN1q2OtOqirgftcxD1ozkSiOFLuygeuYNB/IsEi3ONikInBogFDwIXMKa qI/J1foOpLSfJv8nlwcmdqNM7/ZhXyL5VH2z56gi/M4pbzuWP1PQpsMoQD0Wpgn1/4J6 GIE25x1iUJkwtljhw25qYaCN0j7YSdednYpj/qHQau41jS5IiMwWZO999H4EDMRLZZMK JfkuYLFftwHI1udp9Hcs7qV/nAWRdmTWRCl3Nn43S6/OlhyEDpDqa6aAgWjy2ZXjZ6b1 /yYj79eMUYqFqh6CFxuNHclRbCV/BugOklhfPlCPEi1AswGm6Jf8QPp4GxbHQhnYkNck TRrg== X-Gm-Message-State: APjAAAUDfUT0AgZMvhAdKxgBy+XyfxcwHcGLhnNC/aE654gYNeLza9Bc KaPf+3//iDX4f7VFTAkJTk12hnlrkE4= X-Google-Smtp-Source: APXvYqzdB/XMGf1lWPceG6cZD0MulB0w15MOVIje+UaoUZe25osOHLnYpVsrSJLEINNXKs5U9Isw9A== X-Received: by 2002:a17:906:bce4:: with SMTP id op4mr2843380ejb.173.1552262214945; Sun, 10 Mar 2019 16:56:54 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk (borgediget.toke.dk. [85.204.121.218]) by smtp.gmail.com with ESMTPSA id c42sm4266331eda.63.2019.03.10.16.56.53 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 10 Mar 2019 16:56:54 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 4B029180454; Mon, 11 Mar 2019 00:56:51 +0100 (CET) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Kevin Darbyshire-Bryant Cc: "cake\@lists.bufferbloat.net" In-Reply-To: References: <875zsw110r.fsf@toke.dk> <6B530473-971A-4265-B94B-3595D39D57AF@darbyshire-bryant.me.uk> <87r2bjyoyn.fsf@toke.dk> <4505E3A0-6AE2-4C0B-960D-B1EDB616F0CA@darbyshire-bryant.me.uk> <878sxq1t3e.fsf@toke.dk> <00E839ED-7FA4-4577-838F-775EC9A90608@darbyshire-bryant.me.uk> <871s3h7ghi.fsf@toke.dk> <7FFADD0C-591A-4BB4-B96C-C0157963E1EB@darbyshire-bryant.me.uk> <87ef7g5eec.fsf@toke.dk> X-Clacks-Overhead: GNU Terry Pratchett Date: Mon, 11 Mar 2019 00:56:51 +0100 Message-ID: <87imwq4724.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Cake] act_connmark + dscp X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Mar 2019 23:56:56 -0000 Kevin Darbyshire-Bryant writes: >> On 9 Mar 2019, at 14:08, Toke H=C3=B8iland-J=C3=B8rgensen wrote: >>=20 >> Kevin Darbyshire-Bryant writes: >>=20 >>> OK, what I am trying to do is classify incoming connections into >>> relevant cake tins to impose some bandwidth fairness. e.g. classify >>> bittorrent & things that are downloads into the Bulk tin, and >>> prioritise stuff that is streaming video into the Video tin. Incoming >>> DSCP has a) been washed and b) is unreliable anyway so is unhelpful in >>> this case. iptables runs too late, so having rules to classify >>> incoming stuff is pointless. >>=20 >> Right, I see. >>=20 >> [... snip .. ] >>=20 >>> Then I recently discovered act_connmark >>> (http://linux-ip.net/gl/tc-filters/tc-filters-node2.html) - the >>> thinking being I could use iptables on egress to set fwmarks to >>> classify a connection and have the ingress packets magically follow. >>> This worked but still required 3 tc filter actions to cope with 4 >>> tins: >>>=20 >>> $TC filter add dev $IFACE parent $MAJOR protocol ip handle 0x01 fw acti= on skbedit priority ${MAJOR}1 >>> $TC filter add dev $IFACE parent $MAJOR protocol ip handle 0x03 fw acti= on skbedit priority ${MAJOR}3 >>> $TC filter add dev $IFACE parent $MAJOR protocol ip handle 0x04 fw acti= on skbedit priority ${MAJOR}4 >>=20 >> Right, so this can be replaced with the fwmark action we already added >> (and that I just pushed an update to so it supports masking the value >> before selecting a tin). > > Yes. I=E2=80=99d point out the (hopefully obvious) that the flag mask ne= eds > to be one bit bigger than you might immediately think. e.g. diffserv4 > needs to store 5 values (0-4), 3 bits. 0 is being used as an implied > =E2=80=99tin is not set, fall back to DSCP=E2=80=99. One could store DSC= P+1 of course > and use the same logic. Yeah, or one can just squat on a whole byte like I do in the example ;) >>=20 >>> The overriding (if required) of DSCP could be done in iptables and to >>> avoid going through the iptables DSCP decision/mangling for every >>> packet I could use a flag within the fwmark to indicate the decision >>> had previously been made and stored for this connection. >>=20 >> [ ... ] >>=20 >>> I=E2=80=99m doing 2 things. >>>=20 >>> 1) Classifying traffic into tins on ingress based on the egress DSCP >>> values contained in fwmarks. >>>=20 >>> 2) Basing the fwmark contained DSCP on the initial packet of the >>> connection, possibly after being modified once by iptables rules. >>=20 >> So I tried prototyping what it would actually look like to do all this >> in iptables. The result is below (in iptables-restore format). I haven't >> tested it, but I believe something along the lines of this will work, >> when used along with the CAKE fwmark support (setting a mask of 0xFF >> when configuring CAKE). >>=20 >> Now, the obvious eyesore on this is the need to replicate CAKEs diffserv >> mappings in iptables rules (21 rules in this case, for the diffserv4 >> mapping). As long as this only runs once per connection I don't actually >> think it's much of a performance issue for normal use, but obviously >> there could be pathological cases, and it's also annoying to have to do >> that. >>=20 >> So, first question becomes: Do you agree that the firewall rules below >> would solve your use case (ignoring the ugliness of having to replicate >> the diffserv parsing in iptables)? Or am I missing something? > > I=E2=80=99ve had a quick look over it and think it would work. > > The ugliness of doing the diffserv parsing is what prompted the idea > of storing the DSCP directly and I felt the stored tin selection was > effectively abstracting the diffserv field anyway. Right, but that means that the CAKE interpretation of the fwmark would have to change from something that selects the tin, to something that is treated as a DSCP mark. I think this was the part that I was missing before. I don't think this is a good idea, as that means we tie the marks to one particular use case. > Storing the DSCP is more compatible with differing egress v ingress > mappings (eg. egress diffserv4, ingress diffserv3 though I can=E2=80=99t > really think of a use case for that) I think that if someone wants to do something like that, we are way out of "simple use case that we want to actively support" territory, and can legitimately ask people to go write a BPF filter or something :) > Of course using fwmark as tin number selector in cake doesn=E2=80=99t pre= clude > some other mechanism of storing & recovering DSCP to/from firewall > mark e.g. the previously discussed act-connmark+dscp which would help > anyone who wanted to do such =E2=80=98link traversing=E2=80=99 DSCP shena= nigans. That > of course makes you happier since cake doesn=E2=80=99t embed itself furth= er > into conntrack. Yeah, I definitely don't think CAKE has any business writing DSCP values into the mark. However, as I said before, there may be a case for adding an option to write the tin selection back to conntrack. Something like the patch below would do it (with an option to control it, of course), but it does incur a dependency on another conntrack header, so I'm not sure if it will be acceptable to upstream. Also, we would need to figure out how the option should work. The alternative would be to use another mechanism; the iptables rules replication is one example. Another could be adding a conntrack helper to eBPF... -Toke diff --git a/sch_cake.c b/sch_cake.c index a8fa224..c6b7dd9 100644 --- a/sch_cake.c +++ b/sch_cake.c @@ -78,6 +78,7 @@ =20 #if IS_REACHABLE(CONFIG_NF_CONNTRACK) #include +#include #include #include #endif @@ -1646,6 +1647,27 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, = u16 wash) } } =20 +static void cake_set_tin_connmark(struct cake_sched_data *q, + struct sk_buff *skb, u32 tin) +{ +#if IS_REACHABLE(CONFIG_NF_CONNTRACK) + enum ip_conntrack_info ctinfo; + struct nf_conn *ct; + u32 newmark; + + ct =3D nf_ct_get(skb, &ctinfo); + if (ct) { + newmark =3D (ct->mark & ~q->fwmark_mask); + newmark ^=3D (tin << q->fwmark_shft) & q->fwmark_mask; + + if (ct->mark !=3D newmark) { + ct->mark =3D newmark; + nf_conntrack_event_cache(IPCT_MARK, ct); + } + } +#endif +} + static struct cake_tin_data *cake_select_tin(struct Qdisc *sch, struct sk_buff *skb) { @@ -1678,6 +1700,8 @@ static struct cake_tin_data *cake_select_tin(struct Q= disc *sch, tin =3D 0; } =20 + cake_set_tin_connmark(q, skb, tin); + return &q->tins[tin]; } =20