From mboxrd@z Thu Jan 1 00:00:00 1970
Return-Path:
Received: from smtp113.iad3a.emailsrvr.com (smtp113.iad3a.emailsrvr.com
[173.203.187.113])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(Client did not present a certificate)
by huchra.bufferbloat.net (Postfix) with ESMTPS id 9A66E21F26C
for ;
Wed, 28 May 2014 08:33:43 -0700 (PDT)
Received: from localhost (localhost.localdomain [127.0.0.1])
by smtp23.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id
75AF0280115; Wed, 28 May 2014 11:33:42 -0400 (EDT)
X-Virus-Scanned: OK
Received: from app45.wa-webapps.iad3a (relay.iad3a.rsapps.net [172.27.255.110])
by smtp23.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id
574042800E6; Wed, 28 May 2014 11:33:42 -0400 (EDT)
Received: from reed.com (localhost.localdomain [127.0.0.1])
by app45.wa-webapps.iad3a (Postfix) with ESMTP id 46D4038108C;
Wed, 28 May 2014 11:33:42 -0400 (EDT)
Received: by apps.rackspace.com
(Authenticated sender: dpreed@reed.com, from: dpreed@reed.com)
with HTTP; Wed, 28 May 2014 11:33:42 -0400 (EDT)
Date: Wed, 28 May 2014 11:33:42 -0400 (EDT)
From: dpreed@reed.com
To: "Dave Taht"
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_20140528113342000000_53151"
Importance: Normal
X-Priority: 3 (Normal)
X-Type: html
In-Reply-To:
References:
<1401048053.664331760@apps.rackspace.com>
Message-ID: <1401291222.288942@apps.rackspace.com>
X-Mailer: webmail7.0
Cc: "cerowrt-devel@lists.bufferbloat.net"
, bloat
Subject: Re: [Cerowrt-devel] Ubiquiti QOS
X-BeenThere: cerowrt-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Development issues regarding the cerowrt test router project
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Wed, 28 May 2014 15:33:44 -0000
------=_20140528113342000000_53151
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
=0ASame concern I mentioned with Jim's message. I was not clear what I me=
ant by "pacing" in the context of optimization of latency while preserving =
throughput. It is NOT just a matter of spreading packets out in time that =
I was talking about. It is a matter of doing so without reducing throughp=
ut. That means transmitting as *early* as possible while avoiding congesti=
on. Building a "backlog" and then artificially spreading it out by "add-on=
pacing" will definitely reduce throughput below the flow's fair share of t=
he bottleneck resource.=0A =0AIt is pretty clear to me that you can't get t=
o a minimal latency, optimal throughput control algorithm by a series of "a=
dd ons" in LART. It requires rethinking of the control discipline, and cha=
nges to get more information about congestion earlier, without ever allowin=
g a buffer queue to build up in intermediate nodes - since that destroys la=
tency by definition.=0A =0AAs long as you require buffers to grow at bottle=
neck links in order to get measurements of congestion, you probably are stu=
ck with long-time-constant control loops, and as long as you encourage buff=
ering at OS send stacks you are even worse off at the application layer.=0A=
=0AThe problem is in the assumption that buffer queueing is the only possi=
ble answer. The "pacing" being included in Linux is just another way to bu=
ild bigger buffers (on the sending host), by taking control away from the T=
CP control loop.=0A =0A =0A=0A=0AOn Tuesday, May 27, 2014 1:31pm, "Dave Tah=
t" said:=0A=0A=0A=0A> This has been a good thread, an=
d I'm sorry it was mostly on=0A> cerowrt-devel rather than the main list...=
=0A> =0A> It is not clear from observing google's deployment that pacing of=
the=0A> IW is not in use. I see=0A> clear 1ms boundaries for individual fl=
ows on much lower than iw10=0A> boundaries. (e.g. I see 1-4=0A> packets at =
a time arrive at 1ms intervals - but this could be an=0A> artifact of the c=
apture, intermediate=0A> devices, etc)=0A> =0A> sch_fq comes with explicit =
support for spreading out the initial=0A> window, (by default it allows a f=
ull iw10 burst however) and tcp small=0A> queues and pacing-aware tcps and =
the tso fixes and stuff we don't know=0A> about all are collaborating to re=
duce the web burst size...=0A> =0A> sch_fq_codel used as the host/router qd=
isc basically does spread out=0A> any flow if there is a bottleneck on the =
link. The pacing stuff=0A> spreads flow delivery out across an estimate of =
srtt by clock tick...=0A> =0A> It makes tremendous sense to pace out a flow=
if you are hitting the=0A> wire at 10gbit and know you are stepping down t=
o 100mbit or less on=0A> the end device - that 100x difference in rate is m=
eaningful... and at=0A> the same time to get full throughput out of 10gbit =
some level of tso=0A> offloads is needed... and the initial guess=0A> at th=
e right pace is hard to get right before a couple RTTs go by.=0A> =0A> I lo=
ok forward to learning what's up.=0A> =0A> On Tue, May 27, 2014 at 8:23 AM,=
Jim Gettys wrote:=0A> >=0A> >=0A> >=0A> > On Sun, May=
25, 2014 at 4:00 PM, wrote:=0A> >>=0A> >> Not that it is=
directly relevant, but there is no essential reason to=0A> >> require 50 m=
s. of buffering. That might be true of some particular=0A> >> QOS-related =
router algorithm. 50 ms. is about all one can tolerate in=0A> any=0A> >> r=
outer between source and destination for today's networks - an=0A> upper-bo=
und=0A> >> rather than a minimum.=0A> >>=0A> >>=0A> >>=0A> >> The optimum b=
uffer state for throughput is 1-2 packets worth - in other=0A> >> words, if=
we have an MTU of 1500, 1500 - 3000 bytes. Only the bottleneck=0A> >> buff=
er (the input queue to the lowest speed link along the path) should=0A> hav=
e=0A> >> this much actually buffered. Buffering more than this increases=0A=
> end-to-end=0A> >> latency beyond its optimal state. Increased end-to-end=
latency reduces=0A> the=0A> >> effectiveness of control loops, creating mo=
re congestion.=0A> =0A> This misses an important facet of modern macs (wifi=
, wireless, cable, and gpon),=0A> which which can aggregate 32k or more in =
packets.=0A> =0A> So the ideal size in those cases is much larger than a MT=
U, and has additional=0A> factors governing the ideal - such as the probabi=
lity of a packet loss inducing=0A> a retransmit....=0A> =0A> Ethernet, sure=
.=0A> =0A> >>=0A> >>=0A> >>=0A> >> The rationale for having 50 ms. of buffe=
ring is probably to avoid=0A> >> disruption of bursty mixed flows where the=
bursts might persist for 50=0A> ms.=0A> >> and then die. One reason for th=
is is that source nodes run operating=0A> systems=0A> >> that tend to relea=
se packets in bursts. That's a whole other discussion -=0A> in=0A> >> an id=
eal world, source nodes would avoid bursty packet releases by=0A> letting=
=0A> >> the control by the receiver window be "tight" timing-wise. That is=
, to=0A> >> transmit a packet immediately at the instant an ACK arrives inc=
reasing=0A> the=0A> >> window. This would pace the flow - current OS's ten=
d (due to scheduling=0A> >> mismatches) to send bursts of packets, "catchin=
g up" on sending that=0A> could=0A> >> have been spaced out and done earlie=
r if the feedback from the=0A> receiver's=0A> >> window advancing were heed=
ed.=0A> =0A> This loop has got ever tighter since linux 3.3, to where it's =
really as tight=0A> as a modern cpu scheduler can get it. (or so I keep thi=
nking -=0A> but successive improvements in linux tcp keep proving me wrong.=
:)=0A> =0A> I am really in awe of linux tcp these days. Recently I was ben=
chmarking=0A> windows and macos. Windows only got 60% of the throughput lin=
ux tcp=0A> did at gigE speeds, and osx had a lot of issues at 10mbit and be=
low,=0A> stretch acks and holding the window too high for the path)=0A> =0A=
> I keep hoping better ethernet hardware will arrive that can mix flows=0A>=
even more.=0A> =0A> >>=0A> >>=0A> >>=0A> >> That is, endpoint network stac=
ks (TCP implementations) can worsen=0A> >> congestion by "dallying". The i=
deal end-to-end flows occupying a=0A> congested=0A> >> router would have th=
eir packets paced so that the packets end up being=0A> sent=0A> >> in the l=
east bursty manner that an application can support. The effect=0A> of=0A> =
>> this pacing is to move the "backlog" for each flow quickly into the=0A> =
source=0A> >> node for that flow, which then provides back pressure on the =
application=0A> >> driving the flow, which ultimately is necessary to stanc=
h congestion. =0A> The=0A> >> ideal congestion control mechanism slows the =
sender part of the=0A> application=0A> >> to a pace that can go through the=
network without contributing to=0A> buffering.=0A> >=0A> >=0A> > Pacing is=
in Linux 3.12(?). How long it will take to see widespread=0A> > deploymen=
t is another question, and as for other operating systems, who=0A> > knows.=
=0A> >=0A> > See: https://lwn.net/Articles/564978/=0A> =0A> Steinar drove s=
ome of this with persistence and results...=0A> =0A> http://www.linux-suppo=
rt.com/cms/steinar-h-gunderson-paced-tcp-and-the-fq-scheduler/=0A> =0A> >>=
=0A> >>=0A> >>=0A> >> Current network stacks (including Linux's) don't achi=
eve that goal -=0A> their=0A> >> pushback on application sources is minimal=
- instead they accumulate=0A> >> buffering internal to the network impleme=
ntation.=0A> >=0A> >=0A> > This is much, much less true than it once was. =
There have been substantial=0A> > changes in the Linux TCP stack in the las=
t year or two, to avoid generating=0A> > packets before necessary. Again, =
how long it will take for people to deploy=0A> > this on Linux (and impleme=
nt on other OS's) is a question.=0A> =0A> The data centers I'm in (linode, =
isc, google cloud) seem to be=0A> tracking modern kernels pretty good...=0A=
> =0A> >>=0A> >> This contributes to end-to-end latency as well. But if yo=
u think about=0A> >> it, this is almost as bad as switch-level bufferbloat =
in terms of=0A> degrading=0A> >> user experience. The reason I say "almost=
" is that there are tools,=0A> rarely=0A> >> used in practice, that allow a=
n application to specify that buffering=0A> should=0A> >> not build up in t=
he network stack (in the kernel or wherever it is). =0A> But=0A> >> the def=
ault is not to use those APIs, and to buffer way too much.=0A> >>=0A> >>=0A=
> >>=0A> >> Remember, the network send stack can act similarly to a congest=
ed switch=0A> >> (it is a switch among all the user applications running on=
that node). =0A> IF=0A> >> there is a heavy file transfer, the file transf=
er's buffering acts to=0A> >> increase latency for all other networked comm=
unications on that machine.=0A> >>=0A> >>=0A> >>=0A> >> Traditionally this =
problem has been thought of only as a within-node=0A> >> fairness issue, bu=
t in fact it has a big effect on the switches in=0A> between=0A> >> source =
and destination due to the lack of dispersed pacing of the packets=0A> at=
=0A> >> the source - in other words, the current design does nothing to ste=
m the=0A> >> "burst groups" from a single source mentioned above.=0A> >>=0A=
> >>=0A> >>=0A> >> So we do need the source nodes to implement less "bursty=
" sending=0A> stacks.=0A> >> This is especially true for multiplexed source=
nodes, such as web=0A> servers=0A> >> implementing thousands of flows.=0A>=
>>=0A> >>=0A> >>=0A> >> A combination of codel-style switch-level buffer m=
anagement and the=0A> stack=0A> >> at the sender being implemented to sprea=
d packets in a particular TCP=0A> flow=0A> >> out over time would improve t=
hings a lot. To achieve best throughput,=0A> the=0A> >> optimal way to spr=
ead packets out on an end-to-end basis is to update=0A> the=0A> >> receive =
window (sending ACK) at the receive end as quickly as possible,=0A> and=0A>=
>> to respond to the updated receive window as quickly as possible when it=
=0A> >> increases.=0A> >>=0A> >>=0A> >>=0A> >> Just like the "bufferbloat" =
issue, the problem is caused by applications=0A> >> like streaming video, f=
ile transfers and big web pages that the=0A> application=0A> >> programmer =
sees as not having a latency requirement within the flow, so=0A> the=0A> >>=
application programmer does not have an incentive to control pacing. =0A> =
Thus=0A> >> the operating system has got to push back on the applications' =
flow=0A> somehow,=0A> >> so that the flow ends up paced once it enters the =
Internet itself. So=0A> >> there's no real problem caused by large bufferi=
ng in the network stack=0A> at=0A> >> the endpoint, as long as the stack's =
delivery to the Internet is paced=0A> by=0A> >> some mechanism, e.g. tight =
management of receive window control on an=0A> >> end-to-end basis.=0A> >>=
=0A> >>=0A> >>=0A> >> I don't think this can be fixed by cerowrt, so this i=
s out of place=0A> here.=0A> >> It's partially ameliorated by cerowrt, if i=
t aggressively drops packets=0A> from=0A> >> flows that burst without pacin=
g. fq_codel does this, if the buffer size=0A> it=0A> >> aims for is small -=
but the problem is that the OS stacks don't respond=0A> by=0A> >> pacing..=
. they tend to respond by bursting, not because TCP doesn't=0A> provide=0A>=
>> the mechanisms for pacing, but because the OS stack doesn't transmit as=
=0A> soon=0A> >> as it is allowed to - thus building up a burst unnecessari=
ly.=0A> >>=0A> >>=0A> >>=0A> >> Bursts on a flow are thus bad in general. =
They make congestion happen=0A> >> when it need not.=0A> >=0A> >=0A> > By f=
ar the biggest headache is what the Web does to the network. It has=0A> > =
turned the web into a burst generator.=0A> >=0A> > A typical web page may h=
ave 10 (or even more images). See the "connections=0A> > per page" plot in=
the link below.=0A> >=0A> > A browser downloads the base page, and then, o=
ver N connections, essentially=0A> > simultaneously downloads those embedde=
d objects. Many/most of them are=0A> > small in size (4-10 packets). You =
never even get near slow start.=0A> >=0A> > So you get an IW amount of data=
/TCP connection, with no pacing, and no=0A> > congestion avoidance. It is =
easy to observe 50-100 packets (or more) back=0A> > to back at the bottlene=
ck.=0A> >=0A> > This is (in practice) the amount you have to buffer today: =
that burst of=0A> > packets from a web page. Without flow queuing, you are=
screwed. With it,=0A> > it's annoying, but can be tolerated.=0A> >=0A> >=
=0A> > I go over this is detail in:=0A> >=0A> >=0A> http://gettys.wordpress=
.com/2013/07/10/low-latency-requires-smart-queuing-traditional-aqm-is-not-e=
nough/=0A> >=0A> > So far, I don't believe anyone has tried pacing the IW b=
urst of packets.=0A> > I'd certainly like to see that, but pacing needs to =
be across TCP=0A> > connections (host pairs) to be possibly effective to ou=
twit the gaming the=0A> > web has done to the network.=0A> > =
- Jim=0A> >=0A> >>=0A> >=
>=0A> >>=0A> >>=0A> >>=0A> >>=0A> >>=0A> >>=0A> >> On Sunday, May 25, 2014 =
11:42am, "Mikael Abrahamsson"=0A> =0A> >> said:=0A> >>=0A=
> >> > On Sun, 25 May 2014, Dane Medic wrote:=0A> >> >=0A> >> > > Is it tru=
e that devices with less than 64 MB can't handle QOS?=0A> ->=0A> >> > >=0A>=
>> > >=0A> https://lists.chambana.net/pipermail/commotion-dev/2014-May/001=
816.html=0A> >> >=0A> >> > At gig speeds you need around 50ms worth of buff=
ering. 1 gigabit/s=0A> =3D=0A> >> > 125 megabyte/s meaning for 50ms you nee=
d 6.25 megabyte of buffer.=0A> >> >=0A> >> > I also don't see why performan=
ce and memory size would be relevant,=0A> I'd=0A> >> > say forwarding perfo=
rmance has more to do with CPU speed than=0A> anything=0A> >> > else.=0A> >=
> >=0A> >> > --=0A> >> > Mikael Abrahamsson email: swmike@swm.pp.se=0A> >> =
> _______________________________________________=0A> >> > Cerowrt-devel ma=
iling list=0A> >> > Cerowrt-devel@lists.bufferbloat.net=0A> >> > https://li=
sts.bufferbloat.net/listinfo/cerowrt-devel=0A> >> >=0A> >>=0A> >>=0A> >> __=
_____________________________________________=0A> >> Cerowrt-devel mailing =
list=0A> >> Cerowrt-devel@lists.bufferbloat.net=0A> >> https://lists.buffer=
bloat.net/listinfo/cerowrt-devel=0A> >>=0A> >=0A> >=0A> > _________________=
______________________________=0A> > Cerowrt-devel mailing list=0A> > Cerow=
rt-devel@lists.bufferbloat.net=0A> > https://lists.bufferbloat.net/listinfo=
/cerowrt-devel=0A> >=0A> =0A> =0A> =0A> --=0A> Dave T=C3=A4ht=0A> =0A> NSFW=
:=0A> https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_=
indecent.article=0A>
------=_20140528113342000000_53151
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Same conce=
rn I mentioned with Jim's message. I was not clear what I meant by "=
pacing" in the context of optimization of latency while preserving throughp=
ut. It is NOT just a matter of spreading packets out in time that I w=
as talking about. It is a matter of doing so without reducing throug=
hput. That means transmitting as *early* as possible while avoiding c=
ongestion. Building a "backlog" and then artificially spreading it ou=
t by "add-on pacing" will definitely reduce throughput below the flow's fai=
r share of the bottleneck resource.
=0A=
=0AIt is pretty clear to me that=
you can't get to a minimal latency, optimal throughput control algorithm b=
y a series of "add ons" in LART. It requires rethinking of the contro=
l discipline, and changes to get more information about congestion earlier,=
without ever allowing a buffer queue to build up in intermediate nodes - s=
ince that destroys latency by definition.
=0A
=0AAs long as you require =
buffers to grow at bottleneck links in order to get measurements of congest=
ion, you probably are stuck with long-time-constant control loops, and as l=
ong as you encourage buffering at OS send stacks you are even worse off at =
the application layer.
=0A
=0A=
The problem is in the assumption that buff=
er queueing is the only possible answer. The "pacing" being included =
in Linux is just another way to build bigger buffers (on the sending host),=
by taking control away from the TCP control loop.
=0A
=0A
=0A<=
p style=3D"margin:0;padding:0;">
=
On Tuesday, May 27, 20=
14 1:31pm, "Dave Taht" <dave.taht@gmail.com> said:
=0A=
=0A
> T=
his has been a good thread, and I'm sorry it was mostly on
> cerowr=
t-devel rather than the main list...
>
> It is not clear f=
rom observing google's deployment that pacing of the
> IW is not in=
use. I see
> clear 1ms boundaries for individual flows on much low=
er than iw10
> boundaries. (e.g. I see 1-4
> packets at a t=
ime arrive at 1ms intervals - but this could be an
> artifact of th=
e capture, intermediate
> devices, etc)
>
> sch_fq=
comes with explicit support for spreading out the initial
> window=
, (by default it allows a full iw10 burst however) and tcp small
> =
queues and pacing-aware tcps and the tso fixes and stuff we don't know
> about all are collaborating to reduce the web burst size...
>=
> sch_fq_codel used as the host/router qdisc basically does sprea=
d out
> any flow if there is a bottleneck on the link. The pacing s=
tuff
> spreads flow delivery out across an estimate of srtt by cloc=
k tick...
>
> It makes tremendous sense to pace out a flow=
if you are hitting the
> wire at 10gbit and know you are stepping =
down to 100mbit or less on
> the end device - that 100x difference =
in rate is meaningful... and at
> the same time to get full through=
put out of 10gbit some level of tso
> offloads is needed... and the=
initial guess
> at the right pace is hard to get right before a co=
uple RTTs go by.
>
> I look forward to learning what's up.=
>
> On Tue, May 27, 2014 at 8:23 AM, Jim Gettys <jg@fr=
eedesktop.org> wrote:
> >
> >
> >
=
> > On Sun, May 25, 2014 at 4:00 PM, <dpreed@reed.com> wrote:> >>
> >> Not that it is directly relevant, but =
there is no essential reason to
> >> require 50 ms. of buffer=
ing. That might be true of some particular
> >> QOS-related =
router algorithm. 50 ms. is about all one can tolerate in
> any
> >> router between source and destination for today's networks=
- an
> upper-bound
> >> rather than a minimum.
=
> >>
> >>
> >>
> >> The =
optimum buffer state for throughput is 1-2 packets worth - in other
&g=
t; >> words, if we have an MTU of 1500, 1500 - 3000 bytes. Only the b=
ottleneck
> >> buffer (the input queue to the lowest speed li=
nk along the path) should
> have
> >> this much actua=
lly buffered. Buffering more than this increases
> end-to-end
=
> >> latency beyond its optimal state. Increased end-to-end laten=
cy reduces
> the
> >> effectiveness of control loops,=
creating more congestion.
>
> This misses an important fa=
cet of modern macs (wifi, wireless, cable, and gpon),
> which which=
can aggregate 32k or more in packets.
>
> So the ideal si=
ze in those cases is much larger than a MTU, and has additional
> f=
actors governing the ideal - such as the probability of a packet loss induc=
ing
> a retransmit....
>
> Ethernet, sure.
&g=
t;
> >>
> >>
> >>
> >=
> The rationale for having 50 ms. of buffering is probably to avoid
> >> disruption of bursty mixed flows where the bursts might pers=
ist for 50
> ms.
> >> and then die. One reason for th=
is is that source nodes run operating
> systems
> >> =
that tend to release packets in bursts. That's a whole other discussion -> in
> >> an ideal world, source nodes would avoid bur=
sty packet releases by
> letting
> >> the control by =
the receiver window be "tight" timing-wise. That is, to
> >>=
transmit a packet immediately at the instant an ACK arrives increasing
> the
> >> window. This would pace the flow - current O=
S's tend (due to scheduling
> >> mismatches) to send bursts o=
f packets, "catching up" on sending that
> could
> >>=
have been spaced out and done earlier if the feedback from the
> r=
eceiver's
> >> window advancing were heeded.
>
=
> This loop has got ever tighter since linux 3.3, to where it's really a=
s tight
> as a modern cpu scheduler can get it. (or so I keep think=
ing -
> but successive improvements in linux tcp keep proving me wr=
ong. :)
>
> I am really in awe of linux tcp these days. Re=
cently I was benchmarking
> windows and macos. Windows only got 60%=
of the throughput linux tcp
> did at gigE speeds, and osx had a lo=
t of issues at 10mbit and below,
> stretch acks and holding the win=
dow too high for the path)
>
> I keep hoping better ethern=
et hardware will arrive that can mix flows
> even more.
> <=
br />> >>
> >>
> >>
> >>=
That is, endpoint network stacks (TCP implementations) can worsen
>=
; >> congestion by "dallying". The ideal end-to-end flows occupying =
a
> congested
> >> router would have their packets pa=
ced so that the packets end up being
> sent
> >> in t=
he least bursty manner that an application can support. The effect
&g=
t; of
> >> this pacing is to move the "backlog" for each flow=
quickly into the
> source
> >> node for that flow, w=
hich then provides back pressure on the application
> >> driv=
ing the flow, which ultimately is necessary to stanch congestion.
>=
; The
> >> ideal congestion control mechanism slows the sende=
r part of the
> application
> >> to a pace that can g=
o through the network without contributing to
> buffering.
>=
; >
> >
> > Pacing is in Linux 3.12(?). How long =
it will take to see widespread
> > deployment is another questio=
n, and as for other operating systems, who
> > knows.
> =
>
> > See: https://lwn.net/Articles/564978/
>
&=
gt; Steinar drove some of this with persistence and results...
> > http://www.linux-support.com/cms/steinar-h-gunderson-paced-tcp-and=
-the-fq-scheduler/
>
> >>
> >>
&g=
t; >>
> >> Current network stacks (including Linux's) d=
on't achieve that goal -
> their
> >> pushback on app=
lication sources is minimal - instead they accumulate
> >> bu=
ffering internal to the network implementation.
> >
> &g=
t;
> > This is much, much less true than it once was. There hav=
e been substantial
> > changes in the Linux TCP stack in the las=
t year or two, to avoid generating
> > packets before necessary.=
Again, how long it will take for people to deploy
> > this on =
Linux (and implement on other OS's) is a question.
>
> The=
data centers I'm in (linode, isc, google cloud) seem to be
> track=
ing modern kernels pretty good...
>
> >>
> &=
gt;> This contributes to end-to-end latency as well. But if you think a=
bout
> >> it, this is almost as bad as switch-level bufferblo=
at in terms of
> degrading
> >> user experience. The=
reason I say "almost" is that there are tools,
> rarely
> =
>> used in practice, that allow an application to specify that buffer=
ing
> should
> >> not build up in the network stack (=
in the kernel or wherever it is).
> But
> >> the def=
ault is not to use those APIs, and to buffer way too much.
> >&g=
t;
> >>
> >>
> >> Remember, the n=
etwork send stack can act similarly to a congested switch
> >>=
; (it is a switch among all the user applications running on that node). > IF
> >> there is a heavy file transfer, the file tra=
nsfer's buffering acts to
> >> increase latency for all other=
networked communications on that machine.
> >>
> >=
;>
> >>
> >> Traditionally this problem has =
been thought of only as a within-node
> >> fairness issue, bu=
t in fact it has a big effect on the switches in
> between
>=
; >> source and destination due to the lack of dispersed pacing of th=
e packets
> at
> >> the source - in other words, the =
current design does nothing to stem the
> >> "burst groups" f=
rom a single source mentioned above.
> >>
> >><=
br />> >>
> >> So we do need the source nodes to imp=
lement less "bursty" sending
> stacks.
> >> This is e=
specially true for multiplexed source nodes, such as web
> servers<=
br />> >> implementing thousands of flows.
> >>
> >>
> >>
> >> A combination of codel=
-style switch-level buffer management and the
> stack
> >=
;> at the sender being implemented to spread packets in a particular TCP=
> flow
> >> out over time would improve things a lot=
. To achieve best throughput,
> the
> >> optimal way=
to spread packets out on an end-to-end basis is to update
> the
> >> receive window (sending ACK) at the receive end as quickly=
as possible,
> and
> >> to respond to the updated re=
ceive window as quickly as possible when it
> >> increases.> >>
> >>
> >>
> >> =
Just like the "bufferbloat" issue, the problem is caused by applications
> >> like streaming video, file transfers and big web pages tha=
t the
> application
> >> programmer sees as not havin=
g a latency requirement within the flow, so
> the
> >>=
; application programmer does not have an incentive to control pacing.
> Thus
> >> the operating system has got to push back on=
the applications' flow
> somehow,
> >> so that the f=
low ends up paced once it enters the Internet itself. So
> >>=
; there's no real problem caused by large buffering in the network stack
> at
> >> the endpoint, as long as the stack's delivery=
to the Internet is paced
> by
> >> some mechanism, e=
.g. tight management of receive window control on an
> >> end=
-to-end basis.
> >>
> >>
> >>
> >> I don't think this can be fixed by cerowrt, so this is out =
of place
> here.
> >> It's partially ameliorated by c=
erowrt, if it aggressively drops packets
> from
> >> =
flows that burst without pacing. fq_codel does this, if the buffer size
> it
> >> aims for is small - but the problem is that th=
e OS stacks don't respond
> by
> >> pacing... they te=
nd to respond by bursting, not because TCP doesn't
> provide
&=
gt; >> the mechanisms for pacing, but because the OS stack doesn't tr=
ansmit as
> soon
> >> as it is allowed to - thus buil=
ding up a burst unnecessarily.
> >>
> >>
&=
gt; >>
> >> Bursts on a flow are thus bad in general. =
They make congestion happen
> >> when it need not.
> =
>
> >
> > By far the biggest headache is what the =
Web does to the network. It has
> > turned the web into a burst=
generator.
> >
> > A typical web page may have 10 (o=
r even more images). See the "connections
> > per page" plot in=
the link below.
> >
> > A browser downloads the base=
page, and then, over N connections, essentially
> > simultaneou=
sly downloads those embedded objects. Many/most of them are
> >=
small in size (4-10 packets). You never even get near slow start.
&g=
t; >
> > So you get an IW amount of data/TCP connection, with=
no pacing, and no
> > congestion avoidance. It is easy to obse=
rve 50-100 packets (or more) back
> > to back at the bottleneck.=
> >
> > This is (in practice) the amount you have to=
buffer today: that burst of
> > packets from a web page. Witho=
ut flow queuing, you are screwed. With it,
> > it's annoying, b=
ut can be tolerated.
> >
> >
> > I go over=
this is detail in:
> >
> >
> http://gettys.w=
ordpress.com/2013/07/10/low-latency-requires-smart-queuing-traditional-aqm-=
is-not-enough/
> >
> > So far, I don't believe anyone=
has tried pacing the IW burst of packets.
> > I'd certainly lik=
e to see that, but pacing needs to be across TCP
> > connections=
(host pairs) to be possibly effective to outwit the gaming the
> &=
gt; web has done to the network.
> > =
- Jim
> >
> >&g=
t;
> >>
> >>
> >>
> >&=
gt;
> >>
> >>
> >>
> >=
> On Sunday, May 25, 2014 11:42am, "Mikael Abrahamsson"
> <sw=
mike@swm.pp.se>
> >> said:
> >>
> &g=
t;> > On Sun, 25 May 2014, Dane Medic wrote:
> >> ><=
br />> >> > > Is it true that devices with less than 64 MB c=
an't handle QOS?
> ->
> >> > >
> >=
;> > >
> https://lists.chambana.net/pipermail/commotion-de=
v/2014-May/001816.html
> >> >
> >> > At g=
ig speeds you need around 50ms worth of buffering. 1 gigabit/s
> =
=3D
> >> > 125 megabyte/s meaning for 50ms you need 6.25 m=
egabyte of buffer.
> >> >
> >> > I also d=
on't see why performance and memory size would be relevant,
> I'd> >> > say forwarding performance has more to do with CPU s=
peed than
> anything
> >> > else.
> >&g=
t; >
> >> > --
> >> > Mikael Abrahamss=
on email: swmike@swm.pp.se
> >> > ________________________=
_______________________
> >> > Cerowrt-devel mailing list<=
br />> >> > Cerowrt-devel@lists.bufferbloat.net
> >&=
gt; > https://lists.bufferbloat.net/listinfo/cerowrt-devel
> >=
;> >
> >>
> >>
> >> _______=
________________________________________
> >> Cerowrt-devel m=
ailing list
> >> Cerowrt-devel@lists.bufferbloat.net
>=
; >> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> &=
gt;>
> >
> >
> > ______________________=
_________________________
> > Cerowrt-devel mailing list
&g=
t; > Cerowrt-devel@lists.bufferbloat.net
> > https://lists.bu=
fferbloat.net/listinfo/cerowrt-devel
> >
>
> >
> --
> Dave T=C3=A4ht
>
> NSFW:<=
br />> https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0=
296_indecent.article
>
=0A
------=_20140528113342000000_53151--