From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dpreed@reed.com>
Received: from smtp97.ord1c.emailsrvr.com (smtp97.ord1c.emailsrvr.com
	[108.166.43.97])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by huchra.bufferbloat.net (Postfix) with ESMTPS id E3FD421F518
	for <cerowrt-devel@lists.bufferbloat.net>;
	Fri, 10 Oct 2014 21:20:47 -0700 (PDT)
Received: from localhost (localhost.localdomain [127.0.0.1])
	by smtp5.relay.ord1c.emailsrvr.com (SMTP Server) with ESMTP id
	222C11803E4; Sat, 11 Oct 2014 00:20:46 -0400 (EDT)
X-Virus-Scanned: OK
Received: by smtp5.relay.ord1c.emailsrvr.com (Authenticated sender:
	dpreed-AT-reed.com) with ESMTPSA id 3C5171803B7; 
	Sat, 11 Oct 2014 00:20:45 -0400 (EDT)
X-Sender-Id: dpreed@reed.com
Received: from [192.168.192.52]
	(209-6-168-90.c3-0.ned-ubr1.sbo-ned.ma.cable.rcn.com [209.6.168.90])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA)
	by 0.0.0.0:465 (trex/5.2.13); Sat, 11 Oct 2014 04:20:46 GMT
User-Agent: K-@ Mail for Android
X-Priority: 3
In-Reply-To: <alpine.DEB.2.02.1410102010100.23992@nftneq.ynat.uz>
References: <alpine.DEB.2.02.1410091238190.23992@nftneq.ynat.uz>
	<CAA93jw7wW5GffuRhE=pjDsrkBGep6j-L7DpAQdoQz2xLUNZuLw@mail.gmail.com>
	<1412988767.10122173@apps.rackspace.com>
	<alpine.DEB.2.02.1410102010100.23992@nftneq.ynat.uz>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----2W13NP9HR0T9WUH0STAH381PE23J2Z"
Content-Transfer-Encoding: 7bit
From: "David P. Reed" <dpreed@reed.com>
Date: Sat, 11 Oct 2014 00:20:43 -0400
To: David Lang <david@lang.hm>
Message-ID: <1386b2a4-2002-49e2-8dbc-9747c631256e@reed.com>
Cc: "cerowrt-devel@lists.bufferbloat.net"
	<cerowrt-devel@lists.bufferbloat.net>,
	Jesper Dangaard Brouer <brouer@redhat.com>
Subject: Re: [Cerowrt-devel] bulk packet transmission
X-BeenThere: cerowrt-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Development issues regarding the cerowrt test router project
	<cerowrt-devel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cerowrt-devel>
List-Post: <mailto:cerowrt-devel@lists.bufferbloat.net>
List-Help: <mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Sat, 11 Oct 2014 04:21:16 -0000

------2W13NP9HR0T9WUH0STAH381PE23J2Z
Content-Type: text/plain;
 charset=UTF-8
Content-Transfer-Encoding: quoted-printable

I do know that=2E I would say that benchmarks rarely match real world probl=
ems of real systems- they come from sources like academia and technical mar=
keting depts=2E My job for the last few years has been looking at stems wit=
h dozens of processors across 2 and 4 sockets and multiple 10 GigE adapters=
=2E

There are few benchmarks that look like real workloads=2E And even sma=
ller systems do very poorly compared to what is possible=2E  Linux is slowl=
y getting better but not so much in the network area at scale=2E  That woul=
d take a plan and a rethinking=2E Beyond incremental tweaks=2E My opinion =
=2E=2E=2E ymmv=2E

On Oct 10, 2014, David Lang <david@lang=2Ehm> wrote:
>I'=
ve been watching Linux kernel development for a long time and they
>add loc=
ks 
>only when benchmarks show that a lock is causing a bottleneck=2E They
=
>don't just 
>add them because they can=2E
>
>They do also spend a lot of t=
ime working to avoid locks=2E
>
>One thing that you are missing is that you=
 are thinking of the TCP/IP
>system as 
>a single thread of execution, but =
there's far more going on than that, 
>especially when you have multiple NI=
Cs and cores and have lots of
>interrupts 
>going on=2E
>
>Each TCP/IP stre=
am is not a separate queue of packets in the kernel,
>instead 
>the details=
 of what threads exist is just a table of information=2E The
>packets 
>are=
 all put in a small number of queues to be sent out, and the
>low-level dri=
ver 
>picks the next packet to send from these queues without caring about
=
>what TCP/IP 
>stream it's from=2E
>
>David Lang
>
>On Fri, 10 Oct 2014, dp=
reed@reed=2Ecom wrote:
>
>> The best approach to dealing with "locking over=
head" is to stop
>thinking that 
>> if locks are good, more locking (finer =
grained locking) is better=2E 
>OS 
>> designers (and Linux designers in pa=
rticular) are still putting in
>way too 
>> much locking=2E  I deal with th=
is in my day job (we support systems
>with very 
>> large numbers of cpus a=
nd because of the "fine grained" locking
>obsession, the 
>> parallelized c=
apacity is limited)=2E  If you do a thoughtful design of
>your 
>> network =
code, you don't need lots of locking - because TCP/IP streams
>don't 
>> ha=
ve to interact much - they are quite independent=2E  But instead OS
>design=
ers 
>> spend all their time thinking about doing "one thing at a time"=2E
=
>> 
>> There are some really good ideas out there (e=2Eg=2E RCU) but you ha=
ve to
>think 
>> about the big picture of networking to understand how to u=
se them=2E 
>I'm not 
>> impressed with the folks who do the Linux networki=
ng stacks=2E
>>
>>
>> On Thursday, October 9, 2014 3:48pm, "Dave Taht"
><da=
ve=2Etaht@gmail=2Ecom> said:
>>
>>
>>
>>> I have some hope that the skb->xm=
it_more API could be used to make
>>> aggregating packets in wifi on an AP =
saner=2E (my vision for it was
>that
>>> the overlying qdisc would set xmit=
_more while it still had packets
>>> queued up for a given station and then=
 stop and switch to the next=2E
>>> But the rest of the infrastructure ende=
d up pretty closely tied to
>>> BQL=2E=2E=2E=2E)
>>> 
>>> Jesper just wrote=
 a nice piece about it also=2E
>>>
>http://netoptimizer=2Eblogspot=2Ecom/20=
14/10/unlocked-10gbps-tx-wirespeed-smallest=2Ehtml
>>> 
>>> It was nice to =
fool around at 10GigE for a while! And
>netperf-wrapper
>>> scales to this =
speed also! :wow:
>>> 
>>> I do worry that once sch_fq and fq_codel support=
 is added that there
>>> will be side effects=2E I would really like - now =
that there are al
>>> these people profiling things at this level to see pr=
ofiles
>including
>>> those qdiscs=2E
>>> 
>>> /me goes grumbling back to t=
hinking about wifi=2E
>>> 
>>> On Thu, Oct 9, 2014 at 12:40 PM, David Lang =
<david@lang=2Ehm> wrote:
>>> > lwn=2Enet has an article about a set of new =
patches that avoid some
>locking
>>> > overhead by transmitting multiple pa=
ckets at once=2E
>>> >
>>> > It doesn't work for things with multiple queue=
s (like fq_codel) in
>it's
>>> > current iteration, but it sounds like some=
thing that should be
>looked at and
>>> > watched for latency related issue=
s=2E
>>> >
>>> > http://lwn=2Enet/Articles/615238/
>>> >
>>> > David Lang
>=
>> > _______________________________________________
>>> > Cerowrt-devel ma=
iling list
>>> > Cerowrt-devel@lists=2Ebufferbloat=2Enet
>>> > https://list=
s=2Ebufferbloat=2Enet/listinfo/cerowrt-devel
>>> 
>>> 
>>> 
>>> --
>>> Dave=
 T=C3=A4ht
>>> 
>>> https://www=2Ebufferbloat=2Enet/projects/make-wifi-fast=

>>> _______________________________________________
>>> Cerowrt-devel mail=
ing list
>>> Cerowrt-devel@lists=2Ebufferbloat=2Enet
>>> https://lists=2Ebu=
fferbloat=2Enet/listinfo/cerowrt-devel
>>>

-- Sent from my Android device =
with K-@ Mail=2E Please excuse my brevity=2E
------2W13NP9HR0T9WUH0STAH381PE23J2Z
Content-Type: text/html;
 charset=utf-8
Content-Transfer-Encoding: quoted-printable

<html><head></head><body>I do know that=2E I would say that benchmarks rare=
ly match real world problems of real systems- they come from sources like a=
cademia and technical marketing depts=2E My job for the last few years has =
been looking at stems with dozens of processors across 2 and 4 sockets and =
multiple 10 GigE adapters=2E<br clear=3D"none">
<br clear=3D"none">
There a=
re few benchmarks that look like real workloads=2E And even smaller systems=
 do very poorly compared to what is possible=2E&nbsp; Linux is slowly getti=
ng better but not so much in the network area at scale=2E&nbsp; That would =
take a plan and a rethinking=2E Beyond incremental tweaks=2E My opinion =2E=
=2E=2E ymmv=2E<br clear=3D"none"><br clear=3D"none"><div class=3D"gmail_quo=
te">On Oct 10, 2014, David Lang &lt;david@lang=2Ehm&gt; wrote:<blockquote c=
lass=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0=2E8ex; border-left: 1px=
 solid rgb(204, 204, 204); padding-left: 1ex;">
<pre class=3D"k10mail">I've=
 been watching Linux kernel development for a long time and they add locks =
<br clear=3D"none">only when benchmarks show that a lock is causing a bottl=
eneck=2E They don't just <br clear=3D"none">add them because they can=2E<br=
 clear=3D"none"><br clear=3D"none">They do also spend a lot of time working=
 to avoid locks=2E<br clear=3D"none"><br clear=3D"none">One thing that you =
are missing is that you are thinking of the TCP/IP system as <br clear=3D"n=
one">a single thread of execution, but there's far more going on than that,=
 <br clear=3D"none">especially when you have multiple NICs and cores and ha=
ve lots of interrupts <br clear=3D"none">going on=2E<br clear=3D"none"><br =
clear=3D"none">Each TCP/IP stream is not a separate queue of packets in the=
 kernel, instead <br clear=3D"none">the details of what threads exist is ju=
st a table of information=2E The packets <br clear=3D"none">are all put in =
a small number of queues to be sent out, and the low-level driver <br clear=
=3D"none">picks the next packet to send from these queues without caring ab=
out what TCP/IP <br clear=3D"none">stream it's from=2E<br clear=3D"none"><b=
r clear=3D"none">David Lang<br clear=3D"none"><br clear=3D"none">On Fri, 10=
 Oct 2014, dpreed@reed=2Ecom wrote:<br clear=3D"none"><br clear=3D"none"></=
pre><blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 1ex 0=2E8ex;=
 border-left: 1px solid #729fcf; padding-left: 1ex;">The best approach to d=
ealing with &quot;locking overhead&quot; is to stop thinking that <br clear=
=3D"none">if locks are good, more locking (finer grained locking) is better=
=2E  OS <br clear=3D"none">designers (and Linux designers in particular) ar=
e still putting in way too <br clear=3D"none">much locking=2E  I deal with =
this in my day job (we support systems with very <br clear=3D"none">large n=
umbers of cpus and because of the &quot;fine grained&quot; locking obsessio=
n, the <br clear=3D"none">parallelized capacity is limited)=2E  If you do a=
 thoughtful design of your <br clear=3D"none">network code, you don't need =
lots of locking - because TCP/IP streams don't <br clear=3D"none">have to i=
nteract much - they are quite independent=2E  But instead OS designers <br =
clear=3D"none">spend all their time thinking about doing &quot;one thing at=
 a time&quot;=2E<br clear=3D"none"><br clear=3D"none">There are some really=
 good ideas out there (e=2Eg=2E RCU) but you have to think <br clear=3D"non=
e">about the big picture of networking to understand how to use them=2E  I'=
m not <br clear=3D"none">impressed with the folks who do the Linux networki=
ng stacks=2E<br clear=3D"none"><br clear=3D"none"><br clear=3D"none">On Thu=
rsday, October 9, 2014 3:48pm, &quot;Dave Taht&quot; &lt;dave=2Etaht@gmail=
=2Ecom&gt; said:<br clear=3D"none"><br clear=3D"none"><br clear=3D"none"><b=
r clear=3D"none"><blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt=
 1ex 0=2E8ex; border-left: 1px solid #ad7fa8; padding-left: 1ex;">I have so=
me hope that the skb-&gt;xmit_more API could be used to make<br clear=3D"no=
ne">aggregating packets in wifi on an AP saner=2E (my vision for it was tha=
t<br clear=3D"none">the overlying qdisc would set xmit_more while it still =
had packets<br clear=3D"none">queued up for a given station and then stop a=
nd switch to the next=2E<br clear=3D"none">But the rest of the infrastructu=
re ended up pretty closely tied to<br clear=3D"none">BQL=2E=2E=2E=2E)<br cl=
ear=3D"none"><br clear=3D"none">Jesper just wrote a nice piece about it als=
o=2E<br clear=3D"none"><a shape=3D"rect" href=3D"http://netoptimizer=2Eblog=
spot=2Ecom/2014/10/unlocked-10gbps-tx-wirespeed-smallest=2Ehtml">http://net=
optimizer=2Eblogspot=2Ecom/2014/10/unlocked-10gbps-tx-wirespeed-smallest=2E=
html</a><br clear=3D"none"><br clear=3D"none">It was nice to fool around at=
 10GigE for a while! And netperf-wrapper<br clear=3D"none">scales to this s=
peed also! :wow:<br clear=3D"none"><br clear=3D"none">I do worry that once =
sch_fq and fq_codel support is added that there<br clear=3D"none">will be s=
ide effects=2E I would really like - now that there are al<br clear=3D"none=
">these people profiling things at this level to see profiles including<br =
clear=3D"none">those qdiscs=2E<br clear=3D"none"><br clear=3D"none">/me goe=
s grumbling back to thinking about wifi=2E<br clear=3D"none"><br clear=3D"n=
one">On Thu, Oct 9, 2014 at 12:40 PM, David Lang &lt;david@lang=2Ehm&gt; wr=
ote:<br clear=3D"none"><blockquote class=3D"gmail_quote" style=3D"margin: 0=
pt 0pt 1ex 0=2E8ex; border-left: 1px solid #8ae234; padding-left: 1ex;"><a =
shape=3D"rect" href=3D"http://lwn=2Enet">lwn=2Enet</a> has an article about=
 a set of new patches that avoid some locking<br clear=3D"none">overhead by=
 transmitting multiple packets at once=2E<br clear=3D"none"><br clear=3D"no=
ne">It doesn't work for things with multiple queues (like fq_codel) in it's=
<br clear=3D"none">current iteration, but it sounds like something that sho=
uld be looked at and<br clear=3D"none">watched for latency related issues=
=2E<br clear=3D"none"><br clear=3D"none"><a shape=3D"rect" href=3D"http://l=
wn=2Enet/Articles/615238">http://lwn=2Enet/Articles/615238</a>/<br clear=3D=
"none"><br clear=3D"none">David Lang<br clear=3D"none"><hr><br clear=3D"non=
e">Cerowrt-devel mailing list<br clear=3D"none">Cerowrt-devel@lists=2Ebuffe=
rbloat=2Enet<br clear=3D"none"><a shape=3D"rect" href=3D"https://lists=2Ebu=
fferbloat=2Enet/listinfo/cerowrt-devel">https://lists=2Ebufferbloat=2Enet/l=
istinfo/cerowrt-devel</a></blockquote><br clear=3D"none"><br clear=3D"none"=
><br clear=3D"none">--<br clear=3D"none">Dave T&auml;ht<br clear=3D"none"><=
br clear=3D"none"><a shape=3D"rect" href=3D"https://www=2Ebufferbloat=2Enet=
/projects/make-wifi-fast">https://www=2Ebufferbloat=2Enet/projects/make-wif=
i-fast</a><br clear=3D"none"><hr><br clear=3D"none">Cerowrt-devel mailing l=
ist<br clear=3D"none">Cerowrt-devel@lists=2Ebufferbloat=2Enet<br clear=3D"n=
one"><a shape=3D"rect" href=3D"https://lists=2Ebufferbloat=2Enet/listinfo/c=
erowrt-devel">https://lists=2Ebufferbloat=2Enet/listinfo/cerowrt-devel</a><=
br clear=3D"none"></blockquote></blockquote></blockquote></div><br clear=3D=
"none">-- Sent from my Android device with <b><a shape=3D"rect" href=3D"htt=
ps://play=2Egoogle=2Ecom/store/apps/details?id=3Dcom=2Eonegravity=2Ek10=2Ep=
ro2">K-@ Mail</a></b>=2E Please excuse my brevity=2E</body></html>


------2W13NP9HR0T9WUH0STAH381PE23J2Z--