From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-x242.google.com (mail-pg0-x242.google.com [IPv6:2607:f8b0:400e:c05::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id F35333CB42 for ; Wed, 16 May 2018 16:57:20 -0400 (EDT) Received: by mail-pg0-x242.google.com with SMTP id w3-v6so794589pgv.12 for ; Wed, 16 May 2018 13:57:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=jqWcoeH6eUkuJEqhdc3AOE1IKAMso7N7jmU24z59rxk=; b=bKyMtGrfJppyVWCIDflhx1abqvTD8lRtq53GYNvVIHJHdydWwiRrPr6mQy+tGiJLgt 7zX/+eoGyEFgYF2jlds61rTylPgNSP/Nm75B7+zJPejB/Rgc1ruat8SMpu0A56I9cUyn JuySw74lNmBnLvwDQXhI9BAtlNUwFJxAEb8YW/tFhUbrzj3jhAlcQ+cEMOrFCqMfaNkh 8MXSqqSHmYkNjKVUIQk4W4AxY5Qw74bQVqSGCvfJzNHKvGAQ4QfZGVB3U6xL7qIMipep FpgoFvMlw7akp3sAlU+lTKtAa4oT2BQEIt3TiwnxYsmwiU4MtoDOhGsKhMo3t9LIwaru 3r4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=jqWcoeH6eUkuJEqhdc3AOE1IKAMso7N7jmU24z59rxk=; b=Igeo5qzIl3d/3jec7Dg2UFOU8ozEgGlLSqbPnvwDHti0cjspsDJ7SOY+KWEY2A38jd 8WvlCrA3IWKGoNfC6IC2dGny8xFIzQrEhCzXkfRMQ3NN5YsTFXSWMUJ/wACSLEtcUZ8V yCGjSSbMXhUztpYfOdDhyYRHFjK5Mk0Ib+In9723zqHND2kOznna5R1OBhflyGTO4EBW 0c5xEHFgJKlP76iYBKjTf961CPbvBE1QvJUladSeLRKipw8YpgqiTpQ+xGSYJ3l8jXwJ 0cORZKZyLYhzcM2OSDONbx7VCpRx61uaglIklWiN4Zi8x5b2NMGeQ2SpDZ7pdimN0cVL /axg== X-Gm-Message-State: ALKqPwcdQNNWY3ocWjWt06/gnlh1OVY9gAg6AsrFWKbDZDo5ruhRDI30 HQx1uDdM8P4p16QamG12RRmO5mAlRt9RFnMsQ+Y= X-Google-Smtp-Source: AB8JxZpDBCCTrSqOYBjnqTQ3GN3v/7YuYtDz4ZxWYMu7DgdLu6gq4T/rAn4N5rsfJoI/xkpo9Xb1xd0GVoOtfayNgxc= X-Received: by 2002:a62:9696:: with SMTP id s22-v6mr2452046pfk.191.1526504240235; Wed, 16 May 2018 13:57:20 -0700 (PDT) MIME-Version: 1.0 Received: by 10.236.179.152 with HTTP; Wed, 16 May 2018 13:56:59 -0700 (PDT) In-Reply-To: <152650254618.25701.1794377356779114652.stgit@alrua-kau> References: <152650253056.25701.10138252969621361651.stgit@alrua-kau> <152650254618.25701.1794377356779114652.stgit@alrua-kau> From: Cong Wang Date: Wed, 16 May 2018 13:56:59 -0700 Message-ID: To: =?UTF-8?B?VG9rZSBIw7hpbGFuZC1Kw7hyZ2Vuc2Vu?= Cc: Linux Kernel Network Developers , Cake List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Wed, 16 May 2018 17:14:25 -0400 Subject: Re: [Cake] [PATCH net-next v12 4/7] sch_cake: Add NAT awareness to packet classifier X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 May 2018 20:57:21 -0000 On Wed, May 16, 2018 at 1:29 PM, Toke H=C3=B8iland-J=C3=B8rgensen wrote: > When CAKE is deployed on a gateway that also performs NAT (which is a > common deployment mode), the host fairness mechanism cannot distinguish > internal hosts from each other, and so fails to work correctly. > > To fix this, we add an optional NAT awareness mode, which will query the > kernel conntrack mechanism to obtain the pre-NAT addresses for each packe= t > and use that in the flow and host hashing. > > When the shaper is enabled and the host is already performing NAT, the co= st > of this lookup is negligible. However, in unlimited mode with no NAT bein= g > performed, there is a significant CPU cost at higher bandwidths. For this > reason, the feature is turned off by default. > > Signed-off-by: Toke H=C3=B8iland-J=C3=B8rgensen > --- > net/sched/sch_cake.c | 73 ++++++++++++++++++++++++++++++++++++++++++++= ++++++ > 1 file changed, 73 insertions(+) > > diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c > index 65439b643c92..e1038a7b6686 100644 > --- a/net/sched/sch_cake.c > +++ b/net/sched/sch_cake.c > @@ -71,6 +71,12 @@ > #include > #include > > +#if IS_REACHABLE(CONFIG_NF_CONNTRACK) > +#include > +#include > +#include > +#endif > + > #define CAKE_SET_WAYS (8) > #define CAKE_MAX_TINS (8) > #define CAKE_QUEUES (1024) > @@ -514,6 +520,60 @@ static bool cobalt_should_drop(struct cobalt_vars *v= ars, > return drop; > } > > +#if IS_REACHABLE(CONFIG_NF_CONNTRACK) > + > +static void cake_update_flowkeys(struct flow_keys *keys, > + const struct sk_buff *skb) > +{ > + const struct nf_conntrack_tuple *tuple; > + enum ip_conntrack_info ctinfo; > + struct nf_conn *ct; > + bool rev =3D false; > + > + if (tc_skb_protocol(skb) !=3D htons(ETH_P_IP)) > + return; > + > + ct =3D nf_ct_get(skb, &ctinfo); > + if (ct) { > + tuple =3D nf_ct_tuple(ct, CTINFO2DIR(ctinfo)); > + } else { > + const struct nf_conntrack_tuple_hash *hash; > + struct nf_conntrack_tuple srctuple; > + > + if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), > + NFPROTO_IPV4, dev_net(skb->dev), > + &srctuple)) > + return; > + > + hash =3D nf_conntrack_find_get(dev_net(skb->dev), > + &nf_ct_zone_dflt, > + &srctuple); > + if (!hash) > + return; > + > + rev =3D true; > + ct =3D nf_ct_tuplehash_to_ctrack(hash); > + tuple =3D nf_ct_tuple(ct, !hash->tuple.dst.dir); > + } > + > + keys->addrs.v4addrs.src =3D rev ? tuple->dst.u3.ip : tuple->src.u= 3.ip; > + keys->addrs.v4addrs.dst =3D rev ? tuple->src.u3.ip : tuple->dst.u= 3.ip; > + > + if (keys->ports.ports) { > + keys->ports.src =3D rev ? tuple->dst.u.all : tuple->src.u= .all; > + keys->ports.dst =3D rev ? tuple->src.u.all : tuple->dst.u= .all; > + } > + if (rev) > + nf_ct_put(ct); > +} > +#else > +static void cake_update_flowkeys(struct flow_keys *keys, > + const struct sk_buff *skb) > +{ > + /* There is nothing we can do here without CONNTRACK */ > +} > +#endif > + > /* Cake has several subtle multiple bit settings. In these cases you > * would be matching triple isolate mode as well. > */ > @@ -541,6 +601,9 @@ static u32 cake_hash(struct cake_tin_data *q, const s= truct sk_buff *skb, > skb_flow_dissect_flow_keys(skb, &keys, > FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL); > > + if (flow_mode & CAKE_FLOW_NAT_FLAG) > + cake_update_flowkeys(&keys, skb); > + > /* flow_hash_from_keys() sorts the addresses by value, so we have > * to preserve their order in a separate data structure to treat > * src and dst host addresses as independently selectable. > @@ -1727,6 +1790,12 @@ static int cake_change(struct Qdisc *sch, struct n= lattr *opt, > q->flow_mode =3D (nla_get_u32(tb[TCA_CAKE_FLOW_MODE]) & > CAKE_FLOW_MASK); > > + if (tb[TCA_CAKE_NAT]) { > + q->flow_mode &=3D ~CAKE_FLOW_NAT_FLAG; > + q->flow_mode |=3D CAKE_FLOW_NAT_FLAG * > + !!nla_get_u32(tb[TCA_CAKE_NAT]); > + } I think it's better to return -EOPNOTSUPP when CONFIG_NF_CONNTRACK is not enabled. > + > if (tb[TCA_CAKE_RTT]) { > q->interval =3D nla_get_u32(tb[TCA_CAKE_RTT]); > > @@ -1892,6 +1961,10 @@ static int cake_dump(struct Qdisc *sch, struct sk_= buff *skb) > if (nla_put_u32(skb, TCA_CAKE_ACK_FILTER, q->ack_filter)) > goto nla_put_failure; > > + if (nla_put_u32(skb, TCA_CAKE_NAT, > + !!(q->flow_mode & CAKE_FLOW_NAT_FLAG))) > + goto nla_put_failure; > + > return nla_nest_end(skb, opts); > > nla_put_failure: >