From: Daniel Borkmann <daniel@iogearbox.net>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: davem@davemloft.net, netdev@vger.kernel.org, bpf@vger.kernel.org,
cake@lists.bufferbloat.net, Davide Caratti <dcaratti@redhat.com>,
Jiri Pirko <jiri@resnulli.us>,
Jamal Hadi Salim <jhs@mojatatu.com>,
Cong Wang <xiyou.wangcong@gmail.com>,
Roman Mashak <mrv@mojatatu.com>, Lawrence Brakmo <brakmo@fb.com>,
Ilya Ponetayev <i.ponetaev@ndmsystems.com>,
kafai@fb.com, alexei.starovoitov@gmail.com, edumazet@google.com
Subject: Re: [Cake] [PATCH net v3] sched: consistently handle layer3 header accesses in the presence of VLANs
Date: Sat, 4 Jul 2020 00:17:07 +0200 [thread overview]
Message-ID: <003ff65d-fc24-cd25-9e46-95e7ca2aa31f@iogearbox.net> (raw)
In-Reply-To: <20200703202643.12919-1-toke@redhat.com>
On 7/3/20 10:26 PM, Toke Høiland-Jørgensen wrote:
> There are a couple of places in net/sched/ that check skb->protocol and act
> on the value there. However, in the presence of VLAN tags, the value stored
> in skb->protocol can be inconsistent based on whether VLAN acceleration is
> enabled. The commit quoted in the Fixes tag below fixed the users of
> skb->protocol to use a helper that will always see the VLAN ethertype.
>
> However, most of the callers don't actually handle the VLAN ethertype, but
> expect to find the IP header type in the protocol field. This means that
> things like changing the ECN field, or parsing diffserv values, stops
> working if there's a VLAN tag, or if there are multiple nested VLAN
> tags (QinQ).
>
> To fix this, change the helper to take an argument that indicates whether
> the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
> make sure to skip all of them, so behaviour is consistent even in QinQ
> mode.
>
> To make the helper usable from the ECN code, move it to if_vlan.h instead
> of pkt_sched.h.
>
> v3:
> - Remove empty lines
> - Move vlan variable definitions inside loop in skb_protocol()
> - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
> bpf_skb_ecn_set_ce()
>
> v2:
> - Use eth_type_vlan() helper in skb_protocol()
> - Also fix code that reads skb->protocol directly
> - Change a couple of 'if/else if' statements to switch constructs to avoid
> calling the helper twice
>
> Reported-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
> Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
> include/linux/if_vlan.h | 28 ++++++++++++++++++++++++++++
> include/net/inet_ecn.h | 25 +++++++++++++++++--------
> include/net/pkt_sched.h | 11 -----------
> net/core/filter.c | 10 +++++++---
> net/sched/act_connmark.c | 9 ++++++---
> net/sched/act_csum.c | 2 +-
> net/sched/act_ct.c | 9 ++++-----
> net/sched/act_ctinfo.c | 9 ++++++---
> net/sched/act_mpls.c | 2 +-
> net/sched/act_skbedit.c | 2 +-
> net/sched/cls_api.c | 2 +-
> net/sched/cls_flow.c | 8 ++++----
> net/sched/cls_flower.c | 2 +-
> net/sched/em_ipset.c | 2 +-
> net/sched/em_ipt.c | 2 +-
> net/sched/em_meta.c | 2 +-
> net/sched/sch_cake.c | 4 ++--
> net/sched/sch_dsmark.c | 6 +++---
> net/sched/sch_teql.c | 2 +-
> 19 files changed, 86 insertions(+), 51 deletions(-)
>
> diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
> index b05e855f1ddd..427a5b8597c2 100644
> --- a/include/linux/if_vlan.h
> +++ b/include/linux/if_vlan.h
> @@ -308,6 +308,34 @@ static inline bool eth_type_vlan(__be16 ethertype)
> }
> }
>
> +/* A getter for the SKB protocol field which will handle VLAN tags consistently
> + * whether VLAN acceleration is enabled or not.
> + */
> +static inline __be16 skb_protocol(const struct sk_buff *skb, bool skip_vlan)
> +{
> + unsigned int offset = skb_mac_offset(skb) + sizeof(struct ethhdr);
> + __be16 proto = skb->protocol;
> +
> + if (!skip_vlan)
> + /* VLAN acceleration strips the VLAN header from the skb and
> + * moves it to skb->vlan_proto
> + */
> + return skb_vlan_tag_present(skb) ? skb->vlan_proto : proto;
> +
> + while (eth_type_vlan(proto)) {
> + struct vlan_hdr vhdr, *vh;
> +
> + vh = skb_header_pointer(skb, offset, sizeof(vhdr), &vhdr);
> + if (!vh)
> + break;
> +
> + proto = vh->h_vlan_encapsulated_proto;
> + offset += sizeof(vhdr);
> + }
Hm, why is the while loop 'unbounded'? Does it even make sense to have a packet with
hundreds of vlan hdrs in there what you'd end up walking? What if an attacker crafts
a max sized packet with only vlan_hdr forcing exorbitant looping in fast-path here
(e.g. via af_packet)?
Did you validate that skb_mac_offset() is always valid for the call-sites you converted?
(We have a skb_mac_header_was_set() test to probe for whether skb->mac_header is set
to ~0.)
> + return proto;
> +}
> +
> static inline bool vlan_hw_offload_capable(netdev_features_t features,
> __be16 proto)
> {
[...]
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 73395384afe2..82e1b5b06167 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5853,12 +5853,16 @@ BPF_CALL_1(bpf_skb_ecn_set_ce, struct sk_buff *, skb)
> {
> unsigned int iphdr_len;
>
> - if (skb->protocol == cpu_to_be16(ETH_P_IP))
> + switch (skb_protocol(skb, true)) {
> + case cpu_to_be16(ETH_P_IP):
> iphdr_len = sizeof(struct iphdr);
> - else if (skb->protocol == cpu_to_be16(ETH_P_IPV6))
> + break;
> + case cpu_to_be16(ETH_P_IPV6):
> iphdr_len = sizeof(struct ipv6hdr);
> - else
> + break;
> + default:
> return 0;
> + }
>
> if (skb_headlen(skb) < iphdr_len)
> return 0;
[...]
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index faa78b7dd962..e62beec0d844 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -1538,7 +1538,7 @@ static inline int __tcf_classify(struct sk_buff *skb,
> reclassify:
> #endif
> for (; tp; tp = rcu_dereference_bh(tp->next)) {
> - __be16 protocol = tc_skb_protocol(skb);
> + __be16 protocol = skb_protocol(skb, false);
> int err;
>
> if (tp->protocol != protocol &&
next prev parent reply other threads:[~2020-07-03 22:17 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-03 20:26 Toke Høiland-Jørgensen
2020-07-03 21:35 ` David Miller
2020-07-03 22:17 ` Daniel Borkmann [this message]
2020-07-04 11:28 ` Toke Høiland-Jørgensen
2020-07-04 3:24 ` Toshiaki Makita
2020-07-04 11:33 ` Toke Høiland-Jørgensen
2020-07-06 4:24 ` Toshiaki Makita
2020-07-06 10:53 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.bufferbloat.net/postorius/lists/cake.lists.bufferbloat.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=003ff65d-fc24-cd25-9e46-95e7ca2aa31f@iogearbox.net \
--to=daniel@iogearbox.net \
--cc=alexei.starovoitov@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=brakmo@fb.com \
--cc=cake@lists.bufferbloat.net \
--cc=davem@davemloft.net \
--cc=dcaratti@redhat.com \
--cc=edumazet@google.com \
--cc=i.ponetaev@ndmsystems.com \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=kafai@fb.com \
--cc=mrv@mojatatu.com \
--cc=netdev@vger.kernel.org \
--cc=toke@redhat.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox