From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.toke.dk (mail.toke.dk [IPv6:2001:470:dc45:1000::1]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 728313B2A4 for ; Wed, 2 May 2018 11:11:19 -0400 (EDT) Received: from [10.42.7.1] (localhost.localdomain [IPv6:::1]) by alrua-kau.kau.toke.dk (Postfix) with ESMTP id 4AE0BC400C9; Wed, 2 May 2018 17:11:18 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=toke.dk; s=20161023; t=1525273878; bh=zaMpp0Ugg8D/cBzyDt9LRcCrruGDmR1h2g8Kp7DrR/E=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=pcCqj++/XNB3ec9ezQ2FHHSGNQ8kWQYYDCfmNK10FQ3guycpm12JGHJ1YVLH2ReNR 3ZHoRyizoUidCvByfpjJFDDBj08SJk5pNQGhWUdDQoS85reZLnXomqLtjSopaWlXG4 JayrXstNv9+iTXmpV1ykf10CbriVcMX+/NhtnpJr0TG2ODmPrPCELbVzv92J7I0w06 Q5lEGcjTdPHy6cuSMKub28H2ICoiyxYWSlf4g5H8Yj1qEvY9t2Bxq5BpHQNopYrD0s QHNhMsK+oDTn2lmE+huN5yP8lU7sfDZ6lbwlhMQEOFz1V2xUcetx+9HBEWmYqh4QIH J6yrM2mcWeIXg== From: Toke =?utf-8?q?H=C3=B8iland-J=C3=B8rgensen?= To: netdev@vger.kernel.org Cc: cake@lists.bufferbloat.net Date: Wed, 02 May 2018 17:11:18 +0200 X-Clacks-Overhead: GNU Terry Pratchett Message-ID: <152527387828.14936.13082657065697267030.stgit@alrua-kau> In-Reply-To: <152527385803.14936.8396262019181995139.stgit@alrua-kau> References: <152527385803.14936.8396262019181995139.stgit@alrua-kau> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Subject: [Cake] [PATCH net-next v7 4/7] sch_cake: Add NAT awareness to packet classifier X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 May 2018 15:11:19 -0000 When CAKE is deployed on a gateway that also performs NAT (which is a common deployment mode), the host fairness mechanism cannot distinguish internal hosts from each other, and so fails to work correctly. To fix this, we add an optional NAT awareness mode, which will query the kernel conntrack mechanism to obtain the pre-NAT addresses for each packet and use that in the flow and host hashing. When the shaper is enabled and the host is already performing NAT, the cost of this lookup is negligible. However, in unlimited mode with no NAT being performed, there is a significant CPU cost at higher bandwidths. For this reason, the feature is turned off by default. Signed-off-by: Toke Høiland-Jørgensen --- net/sched/sch_cake.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index a412db9b647e..38f1275dd83d 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -70,6 +70,12 @@ #include #include +#if IS_REACHABLE(CONFIG_NF_CONNTRACK) +#include +#include +#include +#endif + #define CAKE_SET_WAYS (8) #define CAKE_MAX_TINS (8) #define CAKE_QUEUES (1024) @@ -520,6 +526,61 @@ static bool cobalt_should_drop(struct cobalt_vars *vars, return drop; } +#if IS_REACHABLE(CONFIG_NF_CONNTRACK) + +static inline void cake_update_flowkeys(struct flow_keys *keys, + const struct sk_buff *skb) +{ + enum ip_conntrack_info ctinfo; + bool rev = false; + + struct nf_conn *ct; + const struct nf_conntrack_tuple *tuple; + + if (tc_skb_protocol(skb) != htons(ETH_P_IP)) + return; + + ct = nf_ct_get(skb, &ctinfo); + if (ct) { + tuple = nf_ct_tuple(ct, CTINFO2DIR(ctinfo)); + } else { + const struct nf_conntrack_tuple_hash *hash; + struct nf_conntrack_tuple srctuple; + + if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), + NFPROTO_IPV4, dev_net(skb->dev), + &srctuple)) + return; + + hash = nf_conntrack_find_get(dev_net(skb->dev), + &nf_ct_zone_dflt, + &srctuple); + if (!hash) + return; + + rev = true; + ct = nf_ct_tuplehash_to_ctrack(hash); + tuple = nf_ct_tuple(ct, !hash->tuple.dst.dir); + } + + keys->addrs.v4addrs.src = rev ? tuple->dst.u3.ip : tuple->src.u3.ip; + keys->addrs.v4addrs.dst = rev ? tuple->src.u3.ip : tuple->dst.u3.ip; + + if (keys->ports.ports) { + keys->ports.src = rev ? tuple->dst.u.all : tuple->src.u.all; + keys->ports.dst = rev ? tuple->src.u.all : tuple->dst.u.all; + } + if (rev) + nf_ct_put(ct); +} +#else +static inline void cake_update_flowkeys(struct flow_keys *keys, + const struct sk_buff *skb) +{ + /* There is nothing we can do here without CONNTRACK */ +} +#endif + /* Cake has several subtle multiple bit settings. In these cases you * would be matching triple isolate mode as well. */ @@ -547,6 +608,9 @@ cake_hash(struct cake_tin_data *q, const struct sk_buff *skb, int flow_mode) skb_flow_dissect_flow_keys(skb, &keys, FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL); + if (flow_mode & CAKE_FLOW_NAT_FLAG) + cake_update_flowkeys(&keys, skb); + /* flow_hash_from_keys() sorts the addresses by value, so we have * to preserve their order in a separate data structure to treat * src and dst host addresses as independently selectable. @@ -1775,6 +1839,12 @@ static int cake_change(struct Qdisc *sch, struct nlattr *opt, q->flow_mode = (nla_get_u32(tb[TCA_CAKE_FLOW_MODE]) & CAKE_FLOW_MASK); + if (tb[TCA_CAKE_NAT]) { + q->flow_mode &= ~CAKE_FLOW_NAT_FLAG; + q->flow_mode |= CAKE_FLOW_NAT_FLAG * + !!nla_get_u32(tb[TCA_CAKE_NAT]); + } + if (tb[TCA_CAKE_RTT]) { q->interval = nla_get_u32(tb[TCA_CAKE_RTT]);