From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <hkbakke@gmail.com>
Received: from mail-oi0-x230.google.com (mail-oi0-x230.google.com
 [IPv6:2607:f8b0:4003:c06::230])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 06ADF3B2A3
 for <bloat@lists.bufferbloat.net>; Fri, 27 Jan 2017 14:57:03 -0500 (EST)
Received: by mail-oi0-x230.google.com with SMTP id s203so31339097oie.1
 for <bloat@lists.bufferbloat.net>; Fri, 27 Jan 2017 11:57:03 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to;
 bh=qFt4cSlomlUSaE7R1qTjykf7LJXyaDhvDmKmUB5JIWs=;
 b=UYDPlVc9D5YAJbScFyokjO0JKuqsstx8rTFLbQtq4+/GjaPFKovFNoqGopob6NmCWe
 uMSk2XWYCMi3DNlz/c6OAu8xkxHGjAy42TCid8iyWK1yuUKmrPLP47y331ICMtArLdiC
 80Qxnr69ZqfT7Vm0QGym+KITHtcm/8uMV5ZX0jk03FIyTSV55Ro5V+hFkzUvMgRRgOTO
 PN8hIUGm/BneEpYwGceXsnz4f2Hxb8D/B6D+wPXvvMjJzyobZS7zpOyYRfl8WE9z7Ftm
 Y3tT/7QXq4YY8Wuw9TeETN1smjlzE9ISY/nAViRZDmUN34xjtnWg8V48oL1rIUP57G3w
 VE1A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to;
 bh=qFt4cSlomlUSaE7R1qTjykf7LJXyaDhvDmKmUB5JIWs=;
 b=DA4HiYqIS5I8QzAaUakKD9/jjYcWRisLeVjoSig4eL3EcTY6m8U+ISgdLaUf645gaN
 DPhyvJK5qbMkbGfDN9DHrOCCjIfDGlUVTEHunv1O9A6R4zXjMg3W8Mg/H4xwM0OA3L6k
 2qpTcwa+WhpFmvIcXaE3l7rbrPS74I1ukw2whqDEV7VKqKdxMRyVzcRxwUsWLrvGN7Kh
 Ra4e9DINBquUCoYOGa+SOb0IjtZTRQyKHhxMs9E1EpF83wyMP1nCJCeLJRLbmpiRMvY6
 KBDqYVf1bTPzUqhdNQatxoMoT9TVOhXfqnhZS89Im2kbNaWXQV6ml4+Juz/1aSki0f2L
 puTg==
X-Gm-Message-State: AIkVDXIgWssI9LS4ryxOnPdVdOl1jSyDntodn6AF82GWe3fiOghFBZF/iWX8rf7Bx9HvvVLO6H0D6/YyeOnlkA==
X-Received: by 10.202.84.143 with SMTP id i137mr5791340oib.202.1485547023255; 
 Fri, 27 Jan 2017 11:57:03 -0800 (PST)
MIME-Version: 1.0
Received: by 10.157.1.21 with HTTP; Fri, 27 Jan 2017 11:57:02 -0800 (PST)
In-Reply-To: <CAD_cGvErzbNiP+5ADhboWpGj8Q-rQrqaRYvFZ4U8CjxEregZ4A@mail.gmail.com>
References: <CAD_cGvEtHwy9Kat0NkK81E0EFMVWMHe0OCU2C9TvfUCuwkqvjw@mail.gmail.com>
 <0496946b-827a-8527-643d-0b186f52e192@taht.net>
 <1485528030.6360.35.camel@edumazet-glaptop3.roam.corp.google.com>
 <CAD_cGvErzbNiP+5ADhboWpGj8Q-rQrqaRYvFZ4U8CjxEregZ4A@mail.gmail.com>
From: Hans-Kristian Bakke <hkbakke@gmail.com>
Date: Fri, 27 Jan 2017 20:57:02 +0100
Message-ID: <CAD_cGvE_iWwB--2gM5m2zmtHaYzBGJW+amQYcrJxMCYTgx2_dA@mail.gmail.com>
To: bloat <bloat@lists.bufferbloat.net>
Content-Type: multipart/alternative; boundary=001a113debb89e5129054718dec4
Subject: [Bloat] Fwd:  Recommendations for fq_codel and tso/gso in 2017
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Fri, 27 Jan 2017 19:57:04 -0000

--001a113debb89e5129054718dec4
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 27 January 2017 at 15:40, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Thu, 2017-01-26 at 23:55 -0800, Dave T=C3=A4ht wrote:
> >
> > On 1/26/17 11:21 PM, Hans-Kristian Bakke wrote:
> > > Hi
> > >
> > > After having had some issues with inconcistent tso/gso configuration
> > > causing performance issues for sch_fq with pacing in one of my system=
s,
> > > I wonder if is it still recommended to disable gso/tso for interfaces
> > > used with fq_codel qdiscs and shaping using HTB etc.
> >
> > At lower bandwidths gro can do terrible things. Say you have a 1Mbit
> > uplink, and IW10. (At least one device (mvneta) will synthesise 64k of
> > gro packets)
> >
> > a single IW10 burst from one flow injects 130ms of latency.
>
> That is simply a sign of something bad happening from the source.
>
> The router will spend too much time trying to fix the TCP sender by
> smoothing things.
>
> Lets fix the root cause, instead of making everything slow or burn mega
> watts.
>
> GRO aggregates trains of packets for the same flow, in sub ms window.
>
> Why ? Because GRO can not predict the future : It can not know when next
> interrupt might come from the device telling : here is some additional
> packet(s). Maybe next packet is coming in 5 seconds.
>
> Take a look at napi_poll()
>
> 1) If device driver called napi_complete(), all packets are flushed
> (given) to upper stack. No packet will wait in GRO for additional
> segments.
>
> 2) Under flood (we exhausted the napi budget and did not call
> napi_complete()), we make sure no packet can sit in GRO for more than 1
> ms.
>
> Only when the device is under flood and cpu can not drain fast enough RX
> queue, GRO can aggregate packets more aggressively, and the size of GRO
> packets exactly fits the CPU budget.
>
> In a nutshell, GRO is exactly the mechanism that adapts the packet sizes
> to available cpu power.
>
> If your cpu is really fast, then it will dequeue one packet at a time
> and GRO wont kick in.
>
> So the real problem here is that some device drivers implemented a poor
> interrupt mitigation logic, inherited from other OS that had not GRO and
> _had_ to implement their own crap, hurting latencies.
>
> Make sure you disable interrupt mitigation, and leave GRO enabled.
>
> e1000e is notoriously bad for interrupt mitigation.
>
> At Google, we let the NIC sends its RX interrupt ASAP.
>

=E2=80=8BInteresting. Do I understand you correctly that you basically reco=
mmend
=E2=80=8Bloading the e1000e module with InterruptThrottleRate set to 0, or =
is
interrupt mitigation something else?

options e1000e InterruptThrottleRate=3D0(,0,0,0...)

https://www.kernel.org/doc/Documentation/networking/e1000e.txt

I haven't fiddled with interruptthrottlerate since before I even heard of
bufferbloat.


>
> Every usec matters.
>
> So the model for us is very clear : Use GRO and TSO as much as we can,
> but make sure the producers (TCP senders) are smart and control their
> burst sizes.
>
> Think about 50Gbit and 100Gbit, and really the question of having or not
> TSO and GRO is simply moot.
>
>
> Even at 1Gbit, GRO is helping to reduce cpu cycles and thus reduce
> latencies.
>
> Adding a sysctl to limit GRO max size would be trivial, I already
> mentioned that, but nobody cared enough to send a patch.
>
> >
> > >
> > > If there is a trade off, at which bandwith does it generally make mor=
e
> > > sense to enable tso/gso than to have it disabled when doing HTB shape=
d
> > > fq_codel qdiscs?
> >
> > I stopped caring about tuning params at > 40Mbit. < 10 gbit, or rather,
> > trying get below 200usec of jitter|latency. (Others care)
> >
> > And: My expectation was generally that people would ignore our
> > recommendations on disabling offloads!
> >
> > Yes, we should revise the sample sqm code and recommendations for a pos=
t
> > gigabit era to not bother with changing network offloads. Were you
> > modifying the old debloat script?
> >
> > TBF & sch_Cake do peeling of gro/tso/gso back into packets, and then
> > interleave their scheduling, so GRO is both helpful (transiting the
> > stack faster) and harmless, at all bandwidths.
> >
> > HTB doesn't peel. We just ripped out hsfc for sqm-scripts (too buggy),
> > alsp. Leaving: tbf + fq_codel, htb+fq_codel, and cake models there.
> >
>
>
>
> > ...
> >
> > Cake is coming along nicely. I'd love a test in your 2Gbit bonding
> > scenario, particularly in a per host fairness test, at line or shaped
> > rates. We recently got cake working well with nat.
> >
> > http://blog.cerowrt.org/flent/steam/down_working.svg (ignore the latenc=
y
> > figure, the 6 flows were to spots all over the world)
> >
> > > Regards,
> > > Hans-Kristian
> > >
> > >
> > > _______________________________________________
> > > Bloat mailing list
> > > Bloat@lists.bufferbloat.net
> > > https://lists.bufferbloat.net/listinfo/bloat
> > >
> > _______________________________________________
> > Bloat mailing list
> > Bloat@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/bloat
>
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>

--001a113debb89e5129054718dec4
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family:verdana,=
sans-serif"><span style=3D"color:rgb(80,0,80);font-family:arial,sans-serif"=
>On 27 January 2017 at 15:40, Eric Dumazet </span><span dir=3D"ltr" style=
=3D"color:rgb(80,0,80);font-family:arial,sans-serif">&lt;<a href=3D"mailto:=
eric.dumazet@gmail.com" target=3D"_blank">eric.dumazet@gmail.com</a>&gt;</s=
pan><span style=3D"color:rgb(80,0,80);font-family:arial,sans-serif"> wrote:=
</span><br></div><div class=3D"gmail_quote"><div dir=3D"ltr"><div class=3D"=
gmail_extra"><div class=3D"gmail_quote"><div><div class=3D"h5"><blockquote =
class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px sol=
id rgb(204,204,204);padding-left:1ex"><span class=3D"m_-1718481522343432001=
gmail-">On Thu, 2017-01-26 at 23:55 -0800, Dave T=C3=A4ht wrote:<br>
&gt;<br>
&gt; On 1/26/17 11:21 PM, Hans-Kristian Bakke wrote:<br>
&gt; &gt; Hi<br>
&gt; &gt;<br>
&gt; &gt; After having had some issues with inconcistent tso/gso configurat=
ion<br>
&gt; &gt; causing performance issues for sch_fq with pacing in one of my sy=
stems,<br>
&gt; &gt; I wonder if is it still recommended to disable gso/tso for interf=
aces<br>
&gt; &gt; used with fq_codel qdiscs and shaping using HTB etc.<br>
&gt;<br>
&gt; At lower bandwidths gro can do terrible things. Say you have a 1Mbit<b=
r>
&gt; uplink, and IW10. (At least one device (mvneta) will synthesise 64k of=
<br>
&gt; gro packets)<br>
&gt;<br>
&gt; a single IW10 burst from one flow injects 130ms of latency.<br>
<br>
</span>That is simply a sign of something bad happening from the source.<br=
>
<br>
The router will spend too much time trying to fix the TCP sender by<br>
smoothing things.<br>
<br>
Lets fix the root cause, instead of making everything slow or burn mega<br>
watts.<br>
<br>
GRO aggregates trains of packets for the same flow, in sub ms window.<br>
<br>
Why ? Because GRO can not predict the future : It can not know when next<br=
>
interrupt might come from the device telling : here is some additional<br>
packet(s). Maybe next packet is coming in 5 seconds.<br>
<br>
Take a look at napi_poll()<br>
<br>
1) If device driver called napi_complete(), all packets are flushed<br>
(given) to upper stack. No packet will wait in GRO for additional<br>
segments.<br>
<br>
2) Under flood (we exhausted the napi budget and did not call<br>
napi_complete()), we make sure no packet can sit in GRO for more than 1<br>
ms.<br>
<br>
Only when the device is under flood and cpu can not drain fast enough RX<br=
>
queue, GRO can aggregate packets more aggressively, and the size of GRO<br>
packets exactly fits the CPU budget.<br>
<br>
In a nutshell, GRO is exactly the mechanism that adapts the packet sizes<br=
>
to available cpu power.<br>
<br>
If your cpu is really fast, then it will dequeue one packet at a time<br>
and GRO wont kick in.<br>
<br>
So the real problem here is that some device drivers implemented a poor<br>
interrupt mitigation logic, inherited from other OS that had not GRO and<br=
>
_had_ to implement their own crap, hurting latencies.<br>
<br>
Make sure you disable interrupt mitigation, and leave GRO enabled.<br>
<br>
e1000e is notoriously bad for interrupt mitigation.<br>
<br>
At Google, we let the NIC sends its RX interrupt ASAP.<br></blockquote><div=
><br></div></div></div><div><div style=3D"font-family:verdana,sans-serif">=
=E2=80=8BInteresting. Do I understand you correctly that you basically reco=
mmend =E2=80=8Bloading the e1000e module with InterruptThrottleRate set to =
0, or is interrupt mitigation something else?</div><div style=3D"font-famil=
y:verdana,sans-serif"><br></div><div><font face=3D"verdana, sans-serif">opt=
ions e1000e InterruptThrottleRate=3D0(,0,0,<wbr>0...)</font><br></div><div>=
<font face=3D"verdana, sans-serif"><br></font></div><div><font face=3D"verd=
ana, sans-serif"><a href=3D"https://www.kernel.org/doc/Documentation/networ=
king/e1000e.txt" target=3D"_blank">https://www.kernel.org/doc/<wbr>Document=
ation/networking/<wbr>e1000e.txt</a></font><br></div><div><font face=3D"ver=
dana, sans-serif"><br></font></div><div><font face=3D"verdana, sans-serif">=
I haven&#39;t fiddled with interruptthrottlerate since before I even heard =
of bufferbloat.</font></div><br></div><div><div class=3D"h5"><div><br></div=
><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px=
 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Every usec matters.<br>
<br>
So the model for us is very clear : Use GRO and TSO as much as we can,<br>
but make sure the producers (TCP senders) are smart and control their<br>
burst sizes.<br>
<br>
Think about 50Gbit and 100Gbit, and really the question of having or not<br=
>
TSO and GRO is simply moot.<br>
<br>
<br>
Even at 1Gbit, GRO is helping to reduce cpu cycles and thus reduce<br>
latencies.<br>
<br>
Adding a sysctl to limit GRO max size would be trivial, I already<br>
mentioned that, but nobody cared enough to send a patch.<br>
<div class=3D"m_-1718481522343432001gmail-HOEnZb"><div class=3D"m_-17184815=
22343432001gmail-h5"><br>
&gt;<br>
&gt; &gt;<br>
&gt; &gt; If there is a trade off, at which bandwith does it generally make=
 more<br>
&gt; &gt; sense to enable tso/gso than to have it disabled when doing HTB s=
haped<br>
&gt; &gt; fq_codel qdiscs?<br>
&gt;<br>
&gt; I stopped caring about tuning params at &gt; 40Mbit. &lt; 10 gbit, or =
rather,<br>
&gt; trying get below 200usec of jitter|latency. (Others care)<br>
&gt;<br>
&gt; And: My expectation was generally that people would ignore our<br>
&gt; recommendations on disabling offloads!<br>
&gt;<br>
&gt; Yes, we should revise the sample sqm code and recommendations for a po=
st<br>
&gt; gigabit era to not bother with changing network offloads. Were you<br>
&gt; modifying the old debloat script?<br>
&gt;<br>
&gt; TBF &amp; sch_Cake do peeling of gro/tso/gso back into packets, and th=
en<br>
&gt; interleave their scheduling, so GRO is both helpful (transiting the<br=
>
&gt; stack faster) and harmless, at all bandwidths.<br>
&gt;<br>
&gt; HTB doesn&#39;t peel. We just ripped out hsfc for sqm-scripts (too bug=
gy),<br>
&gt; alsp. Leaving: tbf + fq_codel, htb+fq_codel, and cake models there.<br=
>
&gt;<br>
<br>
<br>
<br>
&gt; ...<br>
&gt;<br>
&gt; Cake is coming along nicely. I&#39;d love a test in your 2Gbit bonding=
<br>
&gt; scenario, particularly in a per host fairness test, at line or shaped<=
br>
&gt; rates. We recently got cake working well with nat.<br>
&gt;<br>
&gt; <a href=3D"http://blog.cerowrt.org/flent/steam/down_working.svg" rel=
=3D"noreferrer" target=3D"_blank">http://blog.cerowrt.org/flent/<wbr>steam/=
down_working.svg</a> (ignore the latency<br>
&gt; figure, the 6 flows were to spots all over the world)<br>
&gt;<br>
&gt; &gt; Regards,<br>
&gt; &gt; Hans-Kristian<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt; ______________________________<wbr>_________________<br>
&gt; &gt; Bloat mailing list<br>
&gt; &gt; <a href=3D"mailto:Bloat@lists.bufferbloat.net" target=3D"_blank">=
Bloat@lists.bufferbloat.net</a><br>
&gt; &gt; <a href=3D"https://lists.bufferbloat.net/listinfo/bloat" rel=3D"n=
oreferrer" target=3D"_blank">https://lists.bufferbloat.net/<wbr>listinfo/bl=
oat</a><br>
&gt; &gt;<br>
&gt; ______________________________<wbr>_________________<br>
&gt; Bloat mailing list<br>
&gt; <a href=3D"mailto:Bloat@lists.bufferbloat.net" target=3D"_blank">Bloat=
@lists.bufferbloat.net</a><br>
&gt; <a href=3D"https://lists.bufferbloat.net/listinfo/bloat" rel=3D"norefe=
rrer" target=3D"_blank">https://lists.bufferbloat.net/<wbr>listinfo/bloat</=
a><br>
<br>
<br>
______________________________<wbr>_________________<br>
Bloat mailing list<br>
<a href=3D"mailto:Bloat@lists.bufferbloat.net" target=3D"_blank">Bloat@list=
s.bufferbloat.net</a><br>
<a href=3D"https://lists.bufferbloat.net/listinfo/bloat" rel=3D"noreferrer"=
 target=3D"_blank">https://lists.bufferbloat.net/<wbr>listinfo/bloat</a><br=
>
</div></div></blockquote></div></div></div><br></div></div>
</div><br></div>

--001a113debb89e5129054718dec4--