From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-x22c.google.com (mail-oi0-x22c.google.com [IPv6:2607:f8b0:4003:c06::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 755CC3B2B0 for ; Sun, 12 Jun 2016 16:40:31 -0400 (EDT) Received: by mail-oi0-x22c.google.com with SMTP id p204so182021816oih.3 for ; Sun, 12 Jun 2016 13:40:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=re1nQuVPXBBgnYs0gC5UXmG2WGVuQSPObX4hW6IAhx0=; b=ThhQ0zEpHArxBKFdPjWJtRKZ4oWMiw4iZDbBCnFCOqjC97g7zOxxZ+yF6rh9QBmoDP FqBV/M4Z5vBrdmyZIU07mbmEw/yG8f2d0ZPYw3giF4bdDoGvYXsGwPIpUchN0rcVLEUk ydn5mQMfeH3LZlrVFR4Decy8exBoeRTPXHdlHN8vSJgTWylDt6t5WpcFXxh0kQQpuWeJ BQULRgQ7H4MN/dQgCQ01yerG/EI/EQT6q9w1kHyy+WuBzSPvZpyj/ZB52ViaFAO79pZe 9jIyxjkF3xeG4N/DPMmh6q6WjIKL7TrKRDjvzdfZpmmG2zgyrDEUmAdQ7TXSWafW6zez rqFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=re1nQuVPXBBgnYs0gC5UXmG2WGVuQSPObX4hW6IAhx0=; b=ESIHBuwxrvTqUqhhRF9pJUht2q7j602yNLfduY7H4dM7yd0Tc0BqT6lNAxhdx8p9/f WOzptvxlfL3sNW/v0iESYvy/BTEbjhJfDmtevyZPuZQwa9mfBbnDxDW5JJRMwE2vKnOJ E4o5YM0pgzgqJ2vUYbbmT3QOSNNEsFglEzNMoXILS7ypa9zFrk4vPozgeNHOHIELGhJo uPVhtZ8L1jxtXGxX3u3/5QBH4W8zH9yUJrqVjbXjKZkl2uM2aSV54Wvevben9TQAVgvo LWBzDk2B7zHC3Pi8UyexJyjPO+tWZYSjiP5hizX8W0dHTfftyDoy+6AP2DjhB+UvJrr9 kiLg== X-Gm-Message-State: ALyK8tKJvcvpQM5pRk9LxFNIyMsq0eEDQ6cEa3pcwlsoMBXU3WXS1VYLVvwXlgzk7tFXxNfegqfxO1ezwjck2A== MIME-Version: 1.0 X-Received: by 10.202.108.19 with SMTP id h19mr6225257oic.69.1465764030868; Sun, 12 Jun 2016 13:40:30 -0700 (PDT) Received: by 10.157.16.115 with HTTP; Sun, 12 Jun 2016 13:40:30 -0700 (PDT) Received: by 10.157.16.115 with HTTP; Sun, 12 Jun 2016 13:40:30 -0700 (PDT) In-Reply-To: <1465758256.7945.125.camel@edumazet-glaptop3.roam.corp.google.com> References: <5756ADEC.9070305@darbyshire-bryant.me.uk> <8EE35B4F-5C28-41C7-8795-93A6F606B3A8@gmail.com> <5756E2CA.2020700@darbyshire-bryant.me.uk> <575BD5D0.4060702@darbyshire-bryant.me.uk> <98AFD8C3-2343-4B8B-BFB0-6F877161A039@gmail.com> <1465749085.7945.81.camel@edumazet-glaptop3.roam.corp.google.com> <1465758256.7945.125.camel@edumazet-glaptop3.roam.corp.google.com> Date: Sun, 12 Jun 2016 15:40:30 -0500 Message-ID: From: Benjamin Cronce To: Eric Dumazet Cc: cake@lists.bufferbloat.net, Jonathan Morton Content-Type: multipart/alternative; boundary=001a1142e14a6270fb05351ac8e4 Subject: Re: [Cake] Possible BUG - parent backlog incorrectly updated in case of NET_XMIT_CN X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Jun 2016 20:40:31 -0000 --001a1142e14a6270fb05351ac8e4 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable 4GiB back log is several orders of magnitude more than what you want for a 100Gb link. Unless you think this algorithm will be used for 1Zb/s links. 64bit is a bit overkill On Jun 12, 2016 2:04 PM, "Eric Dumazet" wrote: > On Sun, 2016-06-12 at 20:59 +0300, Jonathan Morton wrote: > > > Not so - unless you are very sure that q->backlog is the same size as > > quantity. > > Obviously this is the case in all linux kernels so far. > > If someone plans to change this at one point, then yes, he/she has to > fix this. > > > In an increasingly 64-bit world, that is by no means assured in the > > future. I don=E2=80=99t like relying on wraparound behaviour without m= aking > > that assumption explicit, which is precisely what the signed types in > > C are for. > > So why don't you send an official patch, since apparently you plan to > extend a qdisc backlog beyond 4,294,967,296 bytes ? > > I am puzzled you have this idea, but why not, some people thought "640KB > out to be enough for anybody" ... > > > > >> IMHO the NET_XMIT_CN semantics are broken. It might be better to > > drop > > >> support for it, since it should rarely be triggered. > > > > > > What exact part is broken ? > > > Semantic looks good to me. > > > > It=E2=80=99s broken in that a negative correction may be required in th= e first > > place. > > But you do realize NET_XMIT_CN was there _before_ Cong Wang added > backlog tracking in 2ccccf5fb43f ("net_sched: update hierarchical > backlog too") > > The fact that Cong used a 'unsigned' instead of a 'signed' here is a > very minor 'bug', since generated code is exactly the _same_. > > Claiming the 'semantics' are broken, while the reality is that the > 'implementation has a potential bug _if_ we extend to 64bit' is quite > different. > > > It places additional burden on every producer of the CN signal who > > isn=E2=80=99t a tail-dropper. I can only assume that the behaviour was > > designed with only tail-drop in mind - and as we both know, that is > > not the only option any more. > > > Nothing implied that only tail drops could happen. > > Where have you seen this in the code exactly ? > > > > It appears to me that there are four things than enqueue() may want to > > tell its caller: > > > > 1: That the packet was enqueued successfully (NET_XMIT_SUCCESS). > > > > 2: That the packet was enqueued successfully, but some other relevant > > packet had to be dropped due to link congestion. > > > > 3: That the packet was *not* enqueued, and this was due to link > > congestion. This is potentially useful for tail-dropping queues, > > including AQMs. > > > > 4: That the packet was *not* enqueued, for some reason other than link > > congestion; the =E2=80=9Cerror case". > > > > Currently NET_XMIT_CN appears to cover case 3, whereas Cake normally > > wants cases 1 & 2 (and will only signal case 4 if a GSO aggregate > > couldn=E2=80=99t be segmented). > > > Not true. NET_XMIT_CN also covers 2. (not sure what 'relevant' means for > you) > > > For the time being, I have changed my development branch of Cake to > > always signal case 1 except for the error case. > > Feel free to change it, it is your code. > > But do not claim NET_XMIT_CN should be removed, just because you missed > the point (being focused on router workloads I guess) > > It can be used by higher level protocol, even if TCP with Small Queues > no longer can hit it with normal settings. > > We have per socket counter of dropped rx packets (sk->sk_drops), we > could very easily add a counter of drops or congestions at tx. > This might allow UDP applications to reduce their rate or 'fast' > retransmit without waiting an extra RTT. > > > > _______________________________________________ > Cake mailing list > Cake@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/cake > --001a1142e14a6270fb05351ac8e4 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

4GiB back log is several orders of magnitude more than what = you want for a 100Gb link. Unless you think this algorithm will be used for= 1Zb/s links. 64bit is a bit overkill

On Jun 12, 2016 2:04 PM, "Eric Dumazet"= ; <eric.dumazet@gmail.com&= gt; wrote:
On Sun, = 2016-06-12 at 20:59 +0300, Jonathan Morton wrote:

> Not so - unless you are very sure that q->backlog is the same size = as
> quantity.

Obviously this is the case in all linux kernels so far.

If someone plans to change this at one point, then yes, he/she has to
fix this.

>=C2=A0 =C2=A0In an increasingly 64-bit world, that is by no means assur= ed in the
> future.=C2=A0 I don=E2=80=99t like relying on wraparound behaviour wit= hout making
> that assumption explicit, which is precisely what the signed types in<= br> > C are for.

So why don't you send an official patch, since apparently you plan to extend a qdisc backlog beyond 4,294,967,296 bytes ?

I am puzzled you have this idea, but why not, some people thought "640= KB
out to be enough for anybody" ...
>
> >> IMHO the NET_XMIT_CN semantics are broken.=C2=A0 It might be = better to
> drop
> >> support for it, since it should rarely be triggered.
> >
> > What exact part is broken ?
> > Semantic looks good to me.
>
> It=E2=80=99s broken in that a negative correction may be required in t= he first
> place.

But you do realize NET_XMIT_CN was there _before_ Cong Wang added
backlog tracking in 2ccccf5fb43f ("net_sched: update hierarchical
backlog too")

The fact that Cong used a 'unsigned' instead of a 'signed' = here is a
very minor 'bug', since generated code is exactly the _same_.

Claiming the 'semantics' are broken, while the reality is that the<= br> 'implementation has a potential bug _if_ we extend to 64bit' is qui= te
different.

>=C2=A0 =C2=A0It places additional burden on every producer of the CN si= gnal who
> isn=E2=80=99t a tail-dropper.=C2=A0 I can only assume that the behavio= ur was
> designed with only tail-drop in mind - and as we both know, that is > not the only option any more.
>
Nothing implied that only tail drops could happen.

Where have you seen this in the code exactly ?


> It appears to me that there are four things than enqueue() may want to=
> tell its caller:
>
> 1: That the packet was enqueued successfully (NET_XMIT_SUCCESS).
>
> 2: That the packet was enqueued successfully, but some other relevant<= br> > packet had to be dropped due to link congestion.
>
> 3: That the packet was *not* enqueued, and this was due to link
> congestion.=C2=A0 This is potentially useful for tail-dropping queues,=
> including AQMs.
>
> 4: That the packet was *not* enqueued, for some reason other than link=
> congestion; the =E2=80=9Cerror case".
>
> Currently NET_XMIT_CN appears to cover case 3, whereas Cake normally > wants cases 1 & 2 (and will only signal case 4 if a GSO aggregate<= br> > couldn=E2=80=99t be segmented).
>
Not true. NET_XMIT_CN also covers 2. (not sure what 'relevant' mean= s for
you)

> For the time being, I have changed my development branch of Cake to > always signal case 1 except for the error case.

Feel free to change it, it is your code.

But do not claim NET_XMIT_CN should be removed, just because you missed
the point (being focused on router workloads I guess)

It can be used by higher level protocol, even if TCP with Small Queues
no longer can hit it with normal settings.

We have per socket counter of dropped rx packets (sk->sk_drops), we
could very easily add a counter of drops or congestions at tx.
This might allow UDP applications to reduce their rate or 'fast' retransmit without waiting an extra RTT.



_______________________________________________
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake
--001a1142e14a6270fb05351ac8e4--