Cake - FQ_codel the next generation
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: davem@davemloft.net, netdev@vger.kernel.org, bpf@vger.kernel.org,
	cake@lists.bufferbloat.net, Davide Caratti <dcaratti@redhat.com>,
	Jiri Pirko <jiri@resnulli.us>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Roman Mashak <mrv@mojatatu.com>, Lawrence Brakmo <brakmo@fb.com>,
	Ilya Ponetayev <i.ponetaev@ndmsystems.com>,
	kafai@fb.com, alexei.starovoitov@gmail.com, edumazet@google.com
Subject: Re: [Cake] [PATCH net v3] sched: consistently handle layer3 header accesses in the presence of VLANs
Date: Sat, 04 Jul 2020 13:28:40 +0200	[thread overview]
Message-ID: <87blkvmsd3.fsf@toke.dk> (raw)
In-Reply-To: <003ff65d-fc24-cd25-9e46-95e7ca2aa31f@iogearbox.net>

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 7/3/20 10:26 PM, Toke Høiland-Jørgensen wrote:
>> There are a couple of places in net/sched/ that check skb->protocol and act
>> on the value there. However, in the presence of VLAN tags, the value stored
>> in skb->protocol can be inconsistent based on whether VLAN acceleration is
>> enabled. The commit quoted in the Fixes tag below fixed the users of
>> skb->protocol to use a helper that will always see the VLAN ethertype.
>> 
>> However, most of the callers don't actually handle the VLAN ethertype, but
>> expect to find the IP header type in the protocol field. This means that
>> things like changing the ECN field, or parsing diffserv values, stops
>> working if there's a VLAN tag, or if there are multiple nested VLAN
>> tags (QinQ).
>> 
>> To fix this, change the helper to take an argument that indicates whether
>> the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
>> make sure to skip all of them, so behaviour is consistent even in QinQ
>> mode.
>> 
>> To make the helper usable from the ECN code, move it to if_vlan.h instead
>> of pkt_sched.h.
>> 
>> v3:
>> - Remove empty lines
>> - Move vlan variable definitions inside loop in skb_protocol()
>> - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
>>    bpf_skb_ecn_set_ce()
>> 
>> v2:
>> - Use eth_type_vlan() helper in skb_protocol()
>> - Also fix code that reads skb->protocol directly
>> - Change a couple of 'if/else if' statements to switch constructs to avoid
>>    calling the helper twice
>> 
>> Reported-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
>> Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
>>   include/linux/if_vlan.h  | 28 ++++++++++++++++++++++++++++
>>   include/net/inet_ecn.h   | 25 +++++++++++++++++--------
>>   include/net/pkt_sched.h  | 11 -----------
>>   net/core/filter.c        | 10 +++++++---
>>   net/sched/act_connmark.c |  9 ++++++---
>>   net/sched/act_csum.c     |  2 +-
>>   net/sched/act_ct.c       |  9 ++++-----
>>   net/sched/act_ctinfo.c   |  9 ++++++---
>>   net/sched/act_mpls.c     |  2 +-
>>   net/sched/act_skbedit.c  |  2 +-
>>   net/sched/cls_api.c      |  2 +-
>>   net/sched/cls_flow.c     |  8 ++++----
>>   net/sched/cls_flower.c   |  2 +-
>>   net/sched/em_ipset.c     |  2 +-
>>   net/sched/em_ipt.c       |  2 +-
>>   net/sched/em_meta.c      |  2 +-
>>   net/sched/sch_cake.c     |  4 ++--
>>   net/sched/sch_dsmark.c   |  6 +++---
>>   net/sched/sch_teql.c     |  2 +-
>>   19 files changed, 86 insertions(+), 51 deletions(-)
>> 
>> diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
>> index b05e855f1ddd..427a5b8597c2 100644
>> --- a/include/linux/if_vlan.h
>> +++ b/include/linux/if_vlan.h
>> @@ -308,6 +308,34 @@ static inline bool eth_type_vlan(__be16 ethertype)
>>   	}
>>   }
>>   
>> +/* A getter for the SKB protocol field which will handle VLAN tags consistently
>> + * whether VLAN acceleration is enabled or not.
>> + */
>> +static inline __be16 skb_protocol(const struct sk_buff *skb, bool skip_vlan)
>> +{
>> +	unsigned int offset = skb_mac_offset(skb) + sizeof(struct ethhdr);
>> +	__be16 proto = skb->protocol;
>> +
>> +	if (!skip_vlan)
>> +		/* VLAN acceleration strips the VLAN header from the skb and
>> +		 * moves it to skb->vlan_proto
>> +		 */
>> +		return skb_vlan_tag_present(skb) ? skb->vlan_proto : proto;
>> +
>> +	while (eth_type_vlan(proto)) {
>> +		struct vlan_hdr vhdr, *vh;
>> +
>> +		vh = skb_header_pointer(skb, offset, sizeof(vhdr), &vhdr);
>> +		if (!vh)
>> +			break;
>> +
>> +		proto = vh->h_vlan_encapsulated_proto;
>> +		offset += sizeof(vhdr);
>> +	}
>
> Hm, why is the while loop 'unbounded'? Does it even make sense to have
> a packet with hundreds of vlan hdrs in there what you'd end up
> walking? What if an attacker crafts a max sized packet with only
> vlan_hdr forcing exorbitant looping in fast-path here (e.g. via
> af_packet)?

Hmm, I guess you're right that could theoretically happen. But on the
other hand, a lot of drivers seem to be cheerfully calling
vlan_get_protocol() on incoming packets, which doesn't have a limit on
the depth either.

I guess I could add a depth limit, but in that case I suppose that
should also be added to vlan_get_protocol() (or the two should be
consolidated). WDYT?

> Did you validate that skb_mac_offset() is always valid for the
> call-sites you converted? (We have a skb_mac_header_was_set() test to
> probe for whether skb->mac_header is set to ~0.)

Not extensively; I kinda assumed it would always be valid at those call
sites, since the callers go on to call ip_hdr() or something similar
right afterwards.

I guess Toshiaki's suggestion to use vlan_get_protocol() could be a way
around this, as that seems to deal with skb->mac_len being 0.

-Toke


  reply	other threads:[~2020-07-04 11:28 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-03 20:26 Toke Høiland-Jørgensen
2020-07-03 21:35 ` David Miller
2020-07-03 22:17 ` Daniel Borkmann
2020-07-04 11:28   ` Toke Høiland-Jørgensen [this message]
2020-07-04  3:24 ` Toshiaki Makita
2020-07-04 11:33   ` Toke Høiland-Jørgensen
2020-07-06  4:24     ` Toshiaki Makita
2020-07-06 10:53       ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.bufferbloat.net/postorius/lists/cake.lists.bufferbloat.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87blkvmsd3.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=brakmo@fb.com \
    --cc=cake@lists.bufferbloat.net \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dcaratti@redhat.com \
    --cc=edumazet@google.com \
    --cc=i.ponetaev@ndmsystems.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kafai@fb.com \
    --cc=mrv@mojatatu.com \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox