[Bloat] Fwd: Recommendations for fq_codel and tso/gso in 2017

General list for discussing Bufferbloat
 help / color / mirror / Atom feed

From: Hans-Kristian Bakke <hkbakke@gmail.com>
To: bloat <bloat@lists.bufferbloat.net>
Subject: [Bloat] Fwd:  Recommendations for fq_codel and tso/gso in 2017
Date: Fri, 27 Jan 2017 20:57:02 +0100	[thread overview]
Message-ID: <CAD_cGvE_iWwB--2gM5m2zmtHaYzBGJW+amQYcrJxMCYTgx2_dA@mail.gmail.com> (raw)
In-Reply-To: <CAD_cGvErzbNiP+5ADhboWpGj8Q-rQrqaRYvFZ4U8CjxEregZ4A@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5294 bytes --]

On 27 January 2017 at 15:40, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Thu, 2017-01-26 at 23:55 -0800, Dave Täht wrote:
> >
> > On 1/26/17 11:21 PM, Hans-Kristian Bakke wrote:
> > > Hi
> > >
> > > After having had some issues with inconcistent tso/gso configuration
> > > causing performance issues for sch_fq with pacing in one of my systems,
> > > I wonder if is it still recommended to disable gso/tso for interfaces
> > > used with fq_codel qdiscs and shaping using HTB etc.
> >
> > At lower bandwidths gro can do terrible things. Say you have a 1Mbit
> > uplink, and IW10. (At least one device (mvneta) will synthesise 64k of
> > gro packets)
> >
> > a single IW10 burst from one flow injects 130ms of latency.
>
> That is simply a sign of something bad happening from the source.
>
> The router will spend too much time trying to fix the TCP sender by
> smoothing things.
>
> Lets fix the root cause, instead of making everything slow or burn mega
> watts.
>
> GRO aggregates trains of packets for the same flow, in sub ms window.
>
> Why ? Because GRO can not predict the future : It can not know when next
> interrupt might come from the device telling : here is some additional
> packet(s). Maybe next packet is coming in 5 seconds.
>
> Take a look at napi_poll()
>
> 1) If device driver called napi_complete(), all packets are flushed
> (given) to upper stack. No packet will wait in GRO for additional
> segments.
>
> 2) Under flood (we exhausted the napi budget and did not call
> napi_complete()), we make sure no packet can sit in GRO for more than 1
> ms.
>
> Only when the device is under flood and cpu can not drain fast enough RX
> queue, GRO can aggregate packets more aggressively, and the size of GRO
> packets exactly fits the CPU budget.
>
> In a nutshell, GRO is exactly the mechanism that adapts the packet sizes
> to available cpu power.
>
> If your cpu is really fast, then it will dequeue one packet at a time
> and GRO wont kick in.
>
> So the real problem here is that some device drivers implemented a poor
> interrupt mitigation logic, inherited from other OS that had not GRO and
> _had_ to implement their own crap, hurting latencies.
>
> Make sure you disable interrupt mitigation, and leave GRO enabled.
>
> e1000e is notoriously bad for interrupt mitigation.
>
> At Google, we let the NIC sends its RX interrupt ASAP.
>

Interesting. Do I understand you correctly that you basically recommend
loading the e1000e module with InterruptThrottleRate set to 0, or is
interrupt mitigation something else?

options e1000e InterruptThrottleRate=0(,0,0,0...)

https://www.kernel.org/doc/Documentation/networking/e1000e.txt

I haven't fiddled with interruptthrottlerate since before I even heard of
bufferbloat.




>
> Every usec matters.
>
> So the model for us is very clear : Use GRO and TSO as much as we can,
> but make sure the producers (TCP senders) are smart and control their
> burst sizes.
>
> Think about 50Gbit and 100Gbit, and really the question of having or not
> TSO and GRO is simply moot.
>
>
> Even at 1Gbit, GRO is helping to reduce cpu cycles and thus reduce
> latencies.
>
> Adding a sysctl to limit GRO max size would be trivial, I already
> mentioned that, but nobody cared enough to send a patch.
>
> >
> > >
> > > If there is a trade off, at which bandwith does it generally make more
> > > sense to enable tso/gso than to have it disabled when doing HTB shaped
> > > fq_codel qdiscs?
> >
> > I stopped caring about tuning params at > 40Mbit. < 10 gbit, or rather,
> > trying get below 200usec of jitter|latency. (Others care)
> >
> > And: My expectation was generally that people would ignore our
> > recommendations on disabling offloads!
> >
> > Yes, we should revise the sample sqm code and recommendations for a post
> > gigabit era to not bother with changing network offloads. Were you
> > modifying the old debloat script?
> >
> > TBF & sch_Cake do peeling of gro/tso/gso back into packets, and then
> > interleave their scheduling, so GRO is both helpful (transiting the
> > stack faster) and harmless, at all bandwidths.
> >
> > HTB doesn't peel. We just ripped out hsfc for sqm-scripts (too buggy),
> > alsp. Leaving: tbf + fq_codel, htb+fq_codel, and cake models there.
> >
>
>
>
> > ...
> >
> > Cake is coming along nicely. I'd love a test in your 2Gbit bonding
> > scenario, particularly in a per host fairness test, at line or shaped
> > rates. We recently got cake working well with nat.
> >
> > http://blog.cerowrt.org/flent/steam/down_working.svg (ignore the latency
> > figure, the 6 flows were to spots all over the world)
> >
> > > Regards,
> > > Hans-Kristian
> > >
> > >
> > > _______________________________________________
> > > Bloat mailing list
> > > Bloat@lists.bufferbloat.net
> > > https://lists.bufferbloat.net/listinfo/bloat
> > >
> > _______________________________________________
> > Bloat mailing list
> > Bloat@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/bloat
>
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>

[-- Attachment #2: Type: text/html, Size: 7941 bytes --]

next prev parent reply	other threads:[~2017-01-27 19:57 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-27  7:21 [Bloat] " Hans-Kristian Bakke
2017-01-27  7:55 ` Dave Täht
2017-01-27 14:40   ` Eric Dumazet
2017-01-27 14:49     ` Sebastian Moeller
2017-01-27 14:59       ` Eric Dumazet
     [not found]     ` <CAD_cGvErzbNiP+5ADhboWpGj8Q-rQrqaRYvFZ4U8CjxEregZ4A@mail.gmail.com>
2017-01-27 19:57       ` Hans-Kristian Bakke [this message]
     [not found]   ` <CAD_cGvFSmmFOAyArqCzjhSZAwDYnBqpAvCjKAyBi+PJS5Ofm3A@mail.gmail.com>
2017-01-27 19:56     ` [Bloat] Fwd: " Hans-Kristian Bakke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.bufferbloat.net/postorius/lists/bloat.lists.bufferbloat.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAD_cGvE_iWwB--2gM5m2zmtHaYzBGJW+amQYcrJxMCYTgx2_dA@mail.gmail.com \
    --to=hkbakke@gmail.com \
    --cc=bloat@lists.bufferbloat.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox