From: Eric Dumazet <eric.dumazet@gmail.com>
To: Dave Taht <dave.taht@gmail.com>
Cc: "Steinar H. Gunderson" <sesse@samfundet.no>,
bloat <bloat@lists.bufferbloat.net>
Subject: Re: [Bloat] Replacing pfifo_fast? (and using sch_fq + hystart fixes)
Date: Mon, 24 Mar 2014 10:41:27 -0700 [thread overview]
Message-ID: <1395682887.12610.62.camel@edumazet-glaptop2.roam.corp.google.com> (raw)
In-Reply-To: <CAA93jw41HM19HjYM3Ny7NLm9XtFpscc+1kFPhcG89Kx1KOrJ6A@mail.gmail.com>
On Mon, 2014-03-24 at 10:09 -0700, Dave Taht wrote:
>
> It has long been my hope that conventional distros would start
> selecting sch_fq and sch_fq_codel up in safe scenarios.
>
> 1) Can an appropriate clocksource be detected from userspace?
>
> if [ have_good_clocksources ]
> then
> if [ i am a router ]
> then
> sysctl -w something=fq_codel # or is it an entry in proc?
> else
> sysctl -w something=sch_fq
> fi
> fi
>
Sure you can do all this from user space.
Thats policy, and this should not belong to kernel.
sysctl -w net.core.default_qdisc=fq
# force a load/delete to bring default qdisc for all devices already up
for ETH in `list of network devices (excluding virtual devices)`
do
tc qdisc add dev $ETH root pfifo 2>/dev/null
tc qdisc del dev $ETH root 2>/dev/null
done
> How early in boot would this have to be to take effect?
It doesn't matter, if you force a load/unload of the qdisc.
>
> 2) In the case of a server machine providing vms, and meeting the
> above precondition(s),
> what would be a more right qdisc, sch_fq or sch_codel?
sch_fq 'works' only for locally generated traffic, as we look at
skb->sk->sk_pacing_rate to read the per socket rate. No way an
hypervisor (or a router 2 hops away) can access to original socket
without hacks.
If your linux vm needs TCP pacing, then it also need fq packet scheduler
in the vm.
>
> 3) Containers?
>
> 4) The machine in the vm going through the virtual ethernet interface?
>
> (I don't understand to what extent tracking the exit of packets from tcp through
> the stack and vm happens - I imagine a TSO is preserved all the way through,
> and also imagine that tcp small queues doesn't survive transit through the vm,
> but I am known to have a fevered imagination.
Small Queues controls the host queues.
Not the queues on external routers. Consider an hypervisor as a router.
>
>
> > Another issue is TCP CUBIC Hystart 'ACK TRAIN' detection that triggers
> > early, since goal of TSO autosizing + FQ/pacing is to get ACK clocking
> > every ms. By design, it tends to get ACK trains, way before the cwnd
> > might reach BDP.
>
> Fascinating! Push on one thing, break another. As best I recall hystart had a
> string of issues like this in it's early deployment.
>
> /me looks forward to one day escaping 3.10-land and observing this for himself
>
> so some sort of bidirectional awareness of the underlying qdisc would be needed
> to retune hystart properly.
>
> Is ms resolution the best possible at this point?
Nope. Hystart ACK train detection is very lazy and current algo was kind
of a hack. If you use better resolution, then you have problems because
of ACK jitter in reverse path. Really, only looking at delay between 2
ACKS is not generic enough, we need something else, or just disable ACK
TRAIN detection, as it is not that useful. Delay detection is less
noisy.
next prev parent reply other threads:[~2014-03-24 17:41 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-24 17:09 Dave Taht
2014-03-24 17:41 ` Eric Dumazet [this message]
2014-03-24 23:10 ` Dave Taht
2014-03-25 0:18 ` Eric Dumazet
2014-03-25 0:38 ` Dave Taht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.bufferbloat.net/postorius/lists/bloat.lists.bufferbloat.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1395682887.12610.62.camel@edumazet-glaptop2.roam.corp.google.com \
--to=eric.dumazet@gmail.com \
--cc=bloat@lists.bufferbloat.net \
--cc=dave.taht@gmail.com \
--cc=sesse@samfundet.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox