[Cerowrt-devel] bulk packet transmission

Development issues regarding the cerowrt test router project
 help / color / mirror / Atom feed

* [Cerowrt-devel] bulk packet transmission
@ 2014-10-09 19:40 David Lang
  2014-10-09 19:48 ` Dave Taht
  0 siblings, 1 reply; 8+ messages in thread
From: David Lang @ 2014-10-09 19:40 UTC (permalink / raw)
  To: cerowrt-devel

lwn.net has an article about a set of new patches that avoid some locking 
overhead by transmitting multiple packets at once.

It doesn't work for things with multiple queues (like fq_codel) in it's current 
iteration, but it sounds like something that should be looked at and watched for 
latency related issues.

http://lwn.net/Articles/615238/

David Lang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cerowrt-devel] bulk packet transmission
  2014-10-09 19:40 [Cerowrt-devel] bulk packet transmission David Lang
@ 2014-10-09 19:48 ` Dave Taht
  2014-10-11  0:52   ` dpreed
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Taht @ 2014-10-09 19:48 UTC (permalink / raw)
  To: David Lang, Jesper Dangaard Brouer; +Cc: cerowrt-devel

I have some hope that the skb->xmit_more API could be used to make
aggregating packets in wifi on an AP saner. (my vision for it was that
the overlying qdisc would set xmit_more while it still had packets
queued up for a given station and then stop and switch to the next.
But the rest of the infrastructure ended up pretty closely tied to
BQL....)

Jesper just wrote a nice piece about it also.
http://netoptimizer.blogspot.com/2014/10/unlocked-10gbps-tx-wirespeed-smallest.html

It was nice to fool around at 10GigE for a while! And netperf-wrapper
scales to this speed also! :wow:

I do worry that once sch_fq and fq_codel support is added that there
will be side effects. I would really like - now that there are al
these people profiling things at this level to see profiles including
those qdiscs.

/me goes grumbling back to thinking about wifi.

On Thu, Oct 9, 2014 at 12:40 PM, David Lang <david@lang.hm> wrote:
> lwn.net has an article about a set of new patches that avoid some locking
> overhead by transmitting multiple packets at once.
>
> It doesn't work for things with multiple queues (like fq_codel) in it's
> current iteration, but it sounds like something that should be looked at and
> watched for latency related issues.
>
> http://lwn.net/Articles/615238/
>
> David Lang
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel

-- 
Dave Täht

https://www.bufferbloat.net/projects/make-wifi-fast

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cerowrt-devel] bulk packet transmission
  2014-10-09 19:48 ` Dave Taht
@ 2014-10-11  0:52   ` dpreed
  2014-10-11  3:15     ` David Lang
                       ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: dpreed @ 2014-10-11  0:52 UTC (permalink / raw)
  To: Dave Taht; +Cc: cerowrt-devel, Jesper Dangaard Brouer

[-- Attachment #1: Type: text/plain, Size: 2822 bytes --]


The best approach to dealing with "locking overhead" is to stop thinking that if locks are good, more locking (finer grained locking) is better.  OS designers (and Linux designers in particular) are still putting in way too much locking.  I deal with this in my day job (we support systems with very large numbers of cpus and because of the "fine grained" locking obsession, the parallelized capacity is limited).   If you do a thoughtful design of your network code, you don't need lots of locking - because TCP/IP streams don't have to interact much - they are quite independent.   But instead OS designers spend all their time thinking about doing "one thing at a time".
 
There are some really good ideas out there (e.g. RCU) but you have to think about the big picture of networking to understand how to use them.  I'm not impressed with the folks who do the Linux networking stacks.


On Thursday, October 9, 2014 3:48pm, "Dave Taht" <dave.taht@gmail.com> said:



> I have some hope that the skb->xmit_more API could be used to make
> aggregating packets in wifi on an AP saner. (my vision for it was that
> the overlying qdisc would set xmit_more while it still had packets
> queued up for a given station and then stop and switch to the next.
> But the rest of the infrastructure ended up pretty closely tied to
> BQL....)
> 
> Jesper just wrote a nice piece about it also.
> http://netoptimizer.blogspot.com/2014/10/unlocked-10gbps-tx-wirespeed-smallest.html
> 
> It was nice to fool around at 10GigE for a while! And netperf-wrapper
> scales to this speed also! :wow:
> 
> I do worry that once sch_fq and fq_codel support is added that there
> will be side effects. I would really like - now that there are al
> these people profiling things at this level to see profiles including
> those qdiscs.
> 
> /me goes grumbling back to thinking about wifi.
> 
> On Thu, Oct 9, 2014 at 12:40 PM, David Lang <david@lang.hm> wrote:
> > lwn.net has an article about a set of new patches that avoid some locking
> > overhead by transmitting multiple packets at once.
> >
> > It doesn't work for things with multiple queues (like fq_codel) in it's
> > current iteration, but it sounds like something that should be looked at and
> > watched for latency related issues.
> >
> > http://lwn.net/Articles/615238/
> >
> > David Lang
> > _______________________________________________
> > Cerowrt-devel mailing list
> > Cerowrt-devel@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/cerowrt-devel
> 
> 
> 
> --
> Dave Täht
> 
> https://www.bufferbloat.net/projects/make-wifi-fast
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> 

[-- Attachment #2: Type: text/html, Size: 3858 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cerowrt-devel] bulk packet transmission
  2014-10-11  0:52   ` dpreed
@ 2014-10-11  3:15     ` David Lang
  2014-10-11  4:20       ` David P. Reed
  2014-10-13 22:11     ` Dave Taht
  2014-10-15 19:49     ` Wes Felter
  2 siblings, 1 reply; 8+ messages in thread
From: David Lang @ 2014-10-11  3:15 UTC (permalink / raw)
  To: dpreed; +Cc: cerowrt-devel, Jesper Dangaard Brouer

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3735 bytes --]

I've been watching Linux kernel development for a long time and they add locks 
only when benchmarks show that a lock is causing a bottleneck. They don't just 
add them because they can.

They do also spend a lot of time working to avoid locks.

One thing that you are missing is that you are thinking of the TCP/IP system as 
a single thread of execution, but there's far more going on than that, 
especially when you have multiple NICs and cores and have lots of interrupts 
going on.

Each TCP/IP stream is not a separate queue of packets in the kernel, instead 
the details of what threads exist is just a table of information. The packets 
are all put in a small number of queues to be sent out, and the low-level driver 
picks the next packet to send from these queues without caring about what TCP/IP 
stream it's from.

David Lang

On Fri, 10 Oct 2014, dpreed@reed.com wrote:

> The best approach to dealing with "locking overhead" is to stop thinking that 
> if locks are good, more locking (finer grained locking) is better.  OS 
> designers (and Linux designers in particular) are still putting in way too 
> much locking.  I deal with this in my day job (we support systems with very 
> large numbers of cpus and because of the "fine grained" locking obsession, the 
> parallelized capacity is limited).  If you do a thoughtful design of your 
> network code, you don't need lots of locking - because TCP/IP streams don't 
> have to interact much - they are quite independent.  But instead OS designers 
> spend all their time thinking about doing "one thing at a time".
> 
> There are some really good ideas out there (e.g. RCU) but you have to think 
> about the big picture of networking to understand how to use them.  I'm not 
> impressed with the folks who do the Linux networking stacks.
>
>
> On Thursday, October 9, 2014 3:48pm, "Dave Taht" <dave.taht@gmail.com> said:
>
>
>
>> I have some hope that the skb->xmit_more API could be used to make
>> aggregating packets in wifi on an AP saner. (my vision for it was that
>> the overlying qdisc would set xmit_more while it still had packets
>> queued up for a given station and then stop and switch to the next.
>> But the rest of the infrastructure ended up pretty closely tied to
>> BQL....)
>> 
>> Jesper just wrote a nice piece about it also.
>> http://netoptimizer.blogspot.com/2014/10/unlocked-10gbps-tx-wirespeed-smallest.html
>> 
>> It was nice to fool around at 10GigE for a while! And netperf-wrapper
>> scales to this speed also! :wow:
>> 
>> I do worry that once sch_fq and fq_codel support is added that there
>> will be side effects. I would really like - now that there are al
>> these people profiling things at this level to see profiles including
>> those qdiscs.
>> 
>> /me goes grumbling back to thinking about wifi.
>> 
>> On Thu, Oct 9, 2014 at 12:40 PM, David Lang <david@lang.hm> wrote:
>> > lwn.net has an article about a set of new patches that avoid some locking
>> > overhead by transmitting multiple packets at once.
>> >
>> > It doesn't work for things with multiple queues (like fq_codel) in it's
>> > current iteration, but it sounds like something that should be looked at and
>> > watched for latency related issues.
>> >
>> > http://lwn.net/Articles/615238/
>> >
>> > David Lang
>> > _______________________________________________
>> > Cerowrt-devel mailing list
>> > Cerowrt-devel@lists.bufferbloat.net
>> > https://lists.bufferbloat.net/listinfo/cerowrt-devel
>> 
>> 
>> 
>> --
>> Dave Täht
>> 
>> https://www.bufferbloat.net/projects/make-wifi-fast
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-devel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cerowrt-devel] bulk packet transmission
  2014-10-11  3:15     ` David Lang
@ 2014-10-11  4:20       ` David P. Reed
  0 siblings, 0 replies; 8+ messages in thread
From: David P. Reed @ 2014-10-11  4:20 UTC (permalink / raw)
  To: David Lang; +Cc: cerowrt-devel, Jesper Dangaard Brouer

[-- Attachment #1: Type: text/plain, Size: 4655 bytes --]

I do know that. I would say that benchmarks rarely match real world problems of real systems- they come from sources like academia and technical marketing depts. My job for the last few years has been looking at stems with dozens of processors across 2 and 4 sockets and multiple 10 GigE adapters.

There are few benchmarks that look like real workloads. And even smaller systems do very poorly compared to what is possible.  Linux is slowly getting better but not so much in the network area at scale.  That would take a plan and a rethinking. Beyond incremental tweaks. My opinion ... ymmv.

On Oct 10, 2014, David Lang <david@lang.hm> wrote:
>I've been watching Linux kernel development for a long time and they
>add locks
>only when benchmarks show that a lock is causing a bottleneck. They
>don't just
>add them because they can.
>
>They do also spend a lot of time working to avoid locks.
>
>One thing that you are missing is that you are thinking of the TCP/IP
>system as
>a single thread of execution, but there's far more going on than that,
>especially when you have multiple NICs and cores and have lots of
>interrupts
>going on.
>
>Each TCP/IP stream is not a separate queue of packets in the kernel,
>instead
>the details of what threads exist is just a table of information. The
>packets
>are all put in a small number of queues to be sent out, and the
>low-level driver
>picks the next packet to send from these queues without caring about
>what TCP/IP
>stream it's from.
>
>David Lang
>
>On Fri, 10 Oct 2014, dpreed@reed.com wrote:
>
>> The best approach to dealing with "locking overhead" is to stop
>thinking that
>> if locks are good, more locking (finer grained locking) is better.
>OS
>> designers (and Linux designers in particular) are still putting in
>way too
>> much locking.  I deal with this in my day job (we support systems
>with very
>> large numbers of cpus and because of the "fine grained" locking
>obsession, the
>> parallelized capacity is limited).  If you do a thoughtful design of
>your
>> network code, you don't need lots of locking - because TCP/IP streams
>don't
>> have to interact much - they are quite independent.  But instead OS
>designers
>> spend all their time thinking about doing "one thing at a time".
>>
>> There are some really good ideas out there (e.g. RCU) but you have to
>think
>> about the big picture of networking to understand how to use them.
>I'm not
>> impressed with the folks who do the Linux networking stacks.
>>
>>
>> On Thursday, October 9, 2014 3:48pm, "Dave Taht"
><dave.taht@gmail.com> said:
>>
>>
>>
>>> I have some hope that the skb->xmit_more API could be used to make
>>> aggregating packets in wifi on an AP saner. (my vision for it was
>that
>>> the overlying qdisc would set xmit_more while it still had packets
>>> queued up for a given station and then stop and switch to the next.
>>> But the rest of the infrastructure ended up pretty closely tied to
>>> BQL....)
>>>
>>> Jesper just wrote a nice piece about it also.
>>>
>http://netoptimizer.blogspot.com/2014/10/unlocked-10gbps-tx-wirespeed-smallest.html
>>>
>>> It was nice to fool around at 10GigE for a while! And
>netperf-wrapper
>>> scales to this speed also! :wow:
>>>
>>> I do worry that once sch_fq and fq_codel support is added that there
>>> will be side effects. I would really like - now that there are al
>>> these people profiling things at this level to see profiles
>including
>>> those qdiscs.
>>>
>>> /me goes grumbling back to thinking about wifi.
>>>
>>> On Thu, Oct 9, 2014 at 12:40 PM, David Lang <david@lang.hm> wrote:
>>> > lwn.net has an article about a set of new patches that avoid some
>locking
>>> > overhead by transmitting multiple packets at once.
>>> >
>>> > It doesn't work for things with multiple queues (like fq_codel) in
>it's
>>> > current iteration, but it sounds like something that should be
>looked at and
>>> > watched for latency related issues.
>>> >
>>> > http://lwn.net/Articles/615238/
>>> >
>>> > David Lang
>>> > _______________________________________________
>>> > Cerowrt-devel mailing list
>>> > Cerowrt-devel@lists.bufferbloat.net
>>> > https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>
>>>
>>>
>>> --
>>> Dave Täht
>>>
>>> https://www.bufferbloat.net/projects/make-wifi-fast
>>> _______________________________________________
>>> Cerowrt-devel mailing list
>>> Cerowrt-devel@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>

-- Sent from my Android device with K-@ Mail. Please excuse my brevity.

[-- Attachment #2: Type: text/html, Size: 6878 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cerowrt-devel] bulk packet transmission
  2014-10-11  0:52   ` dpreed
  2014-10-11  3:15     ` David Lang
@ 2014-10-13 22:11     ` Dave Taht
  2014-10-15 19:49     ` Wes Felter
  2 siblings, 0 replies; 8+ messages in thread
From: Dave Taht @ 2014-10-13 22:11 UTC (permalink / raw)
  To: David Reed; +Cc: cerowrt-devel, Jesper Dangaard Brouer

In the end, I weighed in on this thread on netdev:

http://www.spinics.net/lists/netdev/msg300590.html

On Fri, Oct 10, 2014 at 5:52 PM,  <dpreed@reed.com> wrote:
> The best approach to dealing with "locking overhead" is to stop thinking
> that if locks are good, more locking (finer grained locking) is better.  OS
> designers (and Linux designers in particular) are still putting in way too
> much locking.  I deal with this in my day job (we support systems with very
> large numbers of cpus and because of the "fine grained" locking obsession,
> the parallelized capacity is limited).   If you do a thoughtful design of

I'd certainly like to see you load up the new code under your workloads.

> your network code, you don't need lots of locking - because TCP/IP streams
> don't have to interact much - they are quite independent.   But instead OS
> designers spend all their time thinking about doing "one thing at a time".

Well, it's an engineering trait to focus on doing one thing at a time. I'd
like it if more CS folk had some EE influence and vice versa. Certainly
thinking about the system as a whole, as you must, in circuit design,
helps.

I really regret the shift towards specialization that has happened. When
I was a kid, programmers could design a circuit and soldier it up. And in
many cases, had to. Thankfully the maker movement seems to be bringing
these two fields back together again, and I look forward to the day where
I can look j random programmer in the eye and ask "What would you do
with a billion transitors", and get back a reasonable answer.

> There are some really good ideas out there (e.g. RCU) but you have to think
> about the big picture of networking to understand how to use them.

an RCU conversion is actually part of the xmit_more stuff. The end
results of all this work are being presented this week at linux
plumbers (the site with the preso with the pretty graphs is down
right now)

When people complain about slow progress in the network stack
or how it's overly complex somewhere, and how much easier
it would be to do something clean, and/or move everything in userspace,

I tend to point them at this skbuff structure, and explain how each
and every field
is needed in some circumstance:

http://lxr.free-electrons.com/source/include/linux/skbuff.h#L417

I am ALL in favor of moving packet processing to userspace,
but so far, aside from toy prototypes, I haven't seen anything
genuinely useful that covers the extreme range of link layer
technologies and speeds and devices that linux does.

I do think that netmap has some potential as does stuff
layered on the dpdk, but haven't played with either. And certainly
I'd like to see network hardware gain completion rings, be able
to deliver packets out of order, and thus be made fq_codel
capable.

>
>
> On Thursday, October 9, 2014 3:48pm, "Dave Taht" <dave.taht@gmail.com> said:
>
>> I have some hope that the skb->xmit_more API could be used to make
>> aggregating packets in wifi on an AP saner. (my vision for it was that
>> the overlying qdisc would set xmit_more while it still had packets
>> queued up for a given station and then stop and switch to the next.
>> But the rest of the infrastructure ended up pretty closely tied to
>> BQL....)
>>
>> Jesper just wrote a nice piece about it also.
>>
>> http://netoptimizer.blogspot.com/2014/10/unlocked-10gbps-tx-wirespeed-smallest.html
>>
>> It was nice to fool around at 10GigE for a while! And netperf-wrapper
>> scales to this speed also! :wow:
>>
>> I do worry that once sch_fq and fq_codel support is added that there
>> will be side effects. I would really like - now that there are al
>> these people profiling things at this level to see profiles including
>> those qdiscs.
>>
>> /me goes grumbling back to thinking about wifi.
>>
>> On Thu, Oct 9, 2014 at 12:40 PM, David Lang <david@lang.hm> wrote:
>> > lwn.net has an article about a set of new patches that avoid some
>> > locking
>> > overhead by transmitting multiple packets at once.
>> >
>> > It doesn't work for things with multiple queues (like fq_codel) in it's
>> > current iteration, but it sounds like something that should be looked at
>> > and
>> > watched for latency related issues.
>> >
>> > http://lwn.net/Articles/615238/
>> >
>> > David Lang
>> > _______________________________________________
>> > Cerowrt-devel mailing list
>> > Cerowrt-devel@lists.bufferbloat.net
>> > https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>
>>
>>
>> --
>> Dave Täht
>>
>> https://www.bufferbloat.net/projects/make-wifi-fast
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-devel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>



-- 
Dave Täht

https://www.bufferbloat.net/projects/make-wifi-fast

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cerowrt-devel] bulk packet transmission
  2014-10-11  0:52   ` dpreed
  2014-10-11  3:15     ` David Lang
  2014-10-13 22:11     ` Dave Taht
@ 2014-10-15 19:49     ` Wes Felter
  2014-10-15 22:41       ` dpreed
  2 siblings, 1 reply; 8+ messages in thread
From: Wes Felter @ 2014-10-15 19:49 UTC (permalink / raw)
  To: cerowrt-devel

On 10/10/14, 7:52 PM, dpreed@reed.com wrote:
> The best approach to dealing with "locking overhead" is to stop thinking
> that if locks are good, more locking (finer grained locking) is better.
>   OS designers (and Linux designers in particular) are still putting in
> way too much locking.  I deal with this in my day job (we support
> systems with very large numbers of cpus and because of the "fine
> grained" locking obsession, the parallelized capacity is limited).   If
> you do a thoughtful design of your network code, you don't need lots of
> locking - because TCP/IP streams don't have to interact much - they are
> quite independent.   But instead OS designers spend all their time
> thinking about doing "one thing at a time".

The IX project looks like a promising step in that direction, although 
it still doesn't support sub-core granularity like Linux does.

https://www.usenix.org/conference/osdi14/technical-sessions/presentation/belay

-- 
Wes Felter
IBM Research - Austin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Cerowrt-devel] bulk packet transmission
  2014-10-15 19:49     ` Wes Felter
@ 2014-10-15 22:41       ` dpreed
  0 siblings, 0 replies; 8+ messages in thread
From: dpreed @ 2014-10-15 22:41 UTC (permalink / raw)
  To: Wes Felter; +Cc: cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 1603 bytes --]


I just read the first page of the paper so far, but it sounds like it is heading in a good direction.
It would be interesting to apply also to home access-point/switches, especially since they are now pushing 1 Gb/sec over the air.
 
I will put it on my very interesting stack.
 
 


On Wednesday, October 15, 2014 3:49pm, "Wes Felter" <wmf@felter.org> said:



> On 10/10/14, 7:52 PM, dpreed@reed.com wrote:
> > The best approach to dealing with "locking overhead" is to stop thinking
> > that if locks are good, more locking (finer grained locking) is better.
> > OS designers (and Linux designers in particular) are still putting in
> > way too much locking. I deal with this in my day job (we support
> > systems with very large numbers of cpus and because of the "fine
> > grained" locking obsession, the parallelized capacity is limited). If
> > you do a thoughtful design of your network code, you don't need lots of
> > locking - because TCP/IP streams don't have to interact much - they are
> > quite independent. But instead OS designers spend all their time
> > thinking about doing "one thing at a time".
> 
> The IX project looks like a promising step in that direction, although
> it still doesn't support sub-core granularity like Linux does.
> 
> https://www.usenix.org/conference/osdi14/technical-sessions/presentation/belay
> 
> --
> Wes Felter
> IBM Research - Austin
> 
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> 

[-- Attachment #2: Type: text/html, Size: 2738 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-10-15 22:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-09 19:40 [Cerowrt-devel] bulk packet transmission David Lang
2014-10-09 19:48 ` Dave Taht
2014-10-11  0:52   ` dpreed
2014-10-11  3:15     ` David Lang
2014-10-11  4:20       ` David P. Reed
2014-10-13 22:11     ` Dave Taht
2014-10-15 19:49     ` Wes Felter
2014-10-15 22:41       ` dpreed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox