* [Bloat] Questions About Switch Buffering
@ 2016-06-12 17:44 Noah Causin
2016-06-12 17:48 ` Steinar H. Gunderson
2016-06-12 18:25 ` Benjamin Cronce
0 siblings, 2 replies; 9+ messages in thread
From: Noah Causin @ 2016-06-12 17:44 UTC (permalink / raw)
To: bloat
I have some questions about switch buffering.
Are there any good switches that have modern AQMs in them like fq_codel?
Also, If a home router has a built-in switch, is the buffering
controlled by the AQM, or do the switches have their own internal
buffering that takes precedence?
The scenario I am thinking of is two ports trying to feed data out of a
single port on a switch.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Questions About Switch Buffering
2016-06-12 17:44 [Bloat] Questions About Switch Buffering Noah Causin
@ 2016-06-12 17:48 ` Steinar H. Gunderson
2016-06-12 18:25 ` Benjamin Cronce
1 sibling, 0 replies; 9+ messages in thread
From: Steinar H. Gunderson @ 2016-06-12 17:48 UTC (permalink / raw)
To: bloat
On Sun, Jun 12, 2016 at 01:44:52PM -0400, Noah Causin wrote:
> Are there any good switches that have modern AQMs in them like fq_codel?
No.
> Also, If a home router has a built-in switch, is the buffering controlled by
> the AQM, or do the switches have their own internal buffering that takes
> precedence?
The latter.
> The scenario I am thinking of is two ports trying to feed data out of a
> single port on a switch.
Yes, this is an unsolved problem. :-)
/* Steinar */
--
Homepage: https://www.sesse.net/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Questions About Switch Buffering
2016-06-12 17:44 [Bloat] Questions About Switch Buffering Noah Causin
2016-06-12 17:48 ` Steinar H. Gunderson
@ 2016-06-12 18:25 ` Benjamin Cronce
2016-06-12 21:24 ` Steinar H. Gunderson
1 sibling, 1 reply; 9+ messages in thread
From: Benjamin Cronce @ 2016-06-12 18:25 UTC (permalink / raw)
To: Noah Causin; +Cc: bloat
[-- Attachment #1: Type: text/plain, Size: 2091 bytes --]
Routers and firewalls are common to have AQMs because they mostly deal with
a high to low bandwidth transition from LAN to WAN. Internal networks
rarely have bandwidth issues and congestion only happens when you don't
have enough bandwidth. LANs are relatively easy to increase bandwidth.
Either by binding ports or purchasing 10Gb or 40Gb links. If you have a 1Gb
link from your LAN to your router and it's getting bloat issues, I
recommend purchasing a 10Gb uplink to your switch. AQMs are difficult to do
at line rate, even in hardware, and it will almost always be the case that
a faster port is cheaper and better than implementing an AQM in the switch.
One paper that I read was saying our rate of bandwidth is increasing much
faster than our ability to process packets. Moving data is relatively easy
compared to doing branching logic against the data. The paper went one to
review several 400Gb ports, most of which actually were capable of doing
near line rate 400Gb, but even the best one, once you enabled a simple
strict priority QoS nose-dived down to 150Gb. It's becoming a physics
issue. In order to do complex logic, you need more transistors, and that is
at odds with moving the data faster through the system.
At high link speeds in the future, think 1Tb+, QoS may have to go away
unless we find some sort of photonic processing breakthrough. The good news
is it seems like there's no ceiling on the amount of bandwidth we can push
over fiber.
On Sun, Jun 12, 2016 at 12:44 PM, Noah Causin <n0manletter@gmail.com> wrote:
> I have some questions about switch buffering.
>
> Are there any good switches that have modern AQMs in them like fq_codel?
>
> Also, If a home router has a built-in switch, is the buffering controlled
> by the AQM, or do the switches have their own internal buffering that takes
> precedence?
>
> The scenario I am thinking of is two ports trying to feed data out of a
> single port on a switch.
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
[-- Attachment #2: Type: text/html, Size: 2636 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Questions About Switch Buffering
2016-06-12 18:25 ` Benjamin Cronce
@ 2016-06-12 21:24 ` Steinar H. Gunderson
2016-06-12 22:01 ` Jesper Louis Andersen
0 siblings, 1 reply; 9+ messages in thread
From: Steinar H. Gunderson @ 2016-06-12 21:24 UTC (permalink / raw)
To: bloat
On Sun, Jun 12, 2016 at 01:25:17PM -0500, Benjamin Cronce wrote:
> Internal networks rarely have bandwidth issues and congestion only happens
> when you don't have enough bandwidth.
I don't think this is true. You might not have an aggregate bandwidth issues,
but given the burstiness of TCP and the typical switch buffer depth
(64 frames is a typical number), it's very very easy to lose packets in your
switch even on a relatively quiet network with no downconversion. (Witness
the rise of DCTCP, made especially for internal traffic on this kind of
network.)
/* Steinar */
--
Homepage: https://www.sesse.net/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Questions About Switch Buffering
2016-06-12 21:24 ` Steinar H. Gunderson
@ 2016-06-12 22:01 ` Jesper Louis Andersen
2016-06-13 1:07 ` Benjamin Cronce
0 siblings, 1 reply; 9+ messages in thread
From: Jesper Louis Andersen @ 2016-06-12 22:01 UTC (permalink / raw)
To: Steinar H. Gunderson; +Cc: bloat
[-- Attachment #1: Type: text/plain, Size: 1493 bytes --]
This *is* commonly a problem. Look up "TCP incast".
The scenario is exactly as you describe. A distributed database makes
queries over the same switch to K other nodes in order to verify the
integrity of the answer. Data is served from memory and thus access times
are roughly the same on all the K nodes. If the data response is sizable,
then the switch output port is overwhelmed with traffic, and it drops
packets. TCPs congestion algorithm gets into play.
It is almost like resonance in engineering. At the wrong "frequency", the
bridge/switch will resonate and make everything go haywire.
On Sun, Jun 12, 2016 at 11:24 PM, Steinar H. Gunderson <
sgunderson@bigfoot.com> wrote:
> On Sun, Jun 12, 2016 at 01:25:17PM -0500, Benjamin Cronce wrote:
> > Internal networks rarely have bandwidth issues and congestion only
> happens
> > when you don't have enough bandwidth.
>
> I don't think this is true. You might not have an aggregate bandwidth
> issues,
> but given the burstiness of TCP and the typical switch buffer depth
> (64 frames is a typical number), it's very very easy to lose packets in
> your
> switch even on a relatively quiet network with no downconversion. (Witness
> the rise of DCTCP, made especially for internal traffic on this kind of
> network.)
>
> /* Steinar */
> --
> Homepage: https://www.sesse.net/
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
--
J.
[-- Attachment #2: Type: text/html, Size: 2349 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Questions About Switch Buffering
2016-06-12 22:01 ` Jesper Louis Andersen
@ 2016-06-13 1:07 ` Benjamin Cronce
2016-06-13 1:50 ` Jonathan Morton
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Benjamin Cronce @ 2016-06-13 1:07 UTC (permalink / raw)
To: Jesper Louis Andersen; +Cc: Steinar H. Gunderson, bloat
[-- Attachment #1: Type: text/plain, Size: 2363 bytes --]
I'm not arguing the an AQM isn't useful, just that you can't have your cake
and eat it to. I wouldn't spend much time going down this path without
first talking to someone with a strong background in ASICs for network
switches and asking if it's even a feasible. Everything(very little) I know
about ASICs and buffers is buffering is a very hard problem that east up a
lot of transistors and more transistors means slower ASICs. Almost always a
trade-off between performance and buffer size. CoDel and especially Cake
sound like they would not be ASIC friendly.
On Sun, Jun 12, 2016 at 5:01 PM, Jesper Louis Andersen <
jesper.louis.andersen@gmail.com> wrote:
> This *is* commonly a problem. Look up "TCP incast".
>
> The scenario is exactly as you describe. A distributed database makes
> queries over the same switch to K other nodes in order to verify the
> integrity of the answer. Data is served from memory and thus access times
> are roughly the same on all the K nodes. If the data response is sizable,
> then the switch output port is overwhelmed with traffic, and it drops
> packets. TCPs congestion algorithm gets into play.
>
> It is almost like resonance in engineering. At the wrong "frequency", the
> bridge/switch will resonate and make everything go haywire.
>
>
> On Sun, Jun 12, 2016 at 11:24 PM, Steinar H. Gunderson <
> sgunderson@bigfoot.com> wrote:
>
>> On Sun, Jun 12, 2016 at 01:25:17PM -0500, Benjamin Cronce wrote:
>> > Internal networks rarely have bandwidth issues and congestion only
>> happens
>> > when you don't have enough bandwidth.
>>
>> I don't think this is true. You might not have an aggregate bandwidth
>> issues,
>> but given the burstiness of TCP and the typical switch buffer depth
>> (64 frames is a typical number), it's very very easy to lose packets in
>> your
>> switch even on a relatively quiet network with no downconversion. (Witness
>> the rise of DCTCP, made especially for internal traffic on this kind of
>> network.)
>>
>> /* Steinar */
>> --
>> Homepage: https://www.sesse.net/
>> _______________________________________________
>> Bloat mailing list
>> Bloat@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>>
>
>
>
> --
> J.
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
>
[-- Attachment #2: Type: text/html, Size: 3673 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Questions About Switch Buffering
2016-06-13 1:07 ` Benjamin Cronce
@ 2016-06-13 1:50 ` Jonathan Morton
2016-06-13 4:34 ` Dave Taht
2016-06-13 8:23 ` Steinar H. Gunderson
2 siblings, 0 replies; 9+ messages in thread
From: Jonathan Morton @ 2016-06-13 1:50 UTC (permalink / raw)
To: Benjamin Cronce; +Cc: Jesper Louis Andersen, bloat
[-- Attachment #1: Type: text/plain, Size: 328 bytes --]
I think at switch level we should be thinking of simple AQM rather than the
flow isolating type. Just a simple way to inform the endpoints that some
congestion is occurring so please flood sparingly. As previously noted,
WRED is considerably better than nothing, even if it's not very good in
absolute terms.
- Jonathan Morton
[-- Attachment #2: Type: text/html, Size: 374 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Questions About Switch Buffering
2016-06-13 1:07 ` Benjamin Cronce
2016-06-13 1:50 ` Jonathan Morton
@ 2016-06-13 4:34 ` Dave Taht
2016-06-13 8:23 ` Steinar H. Gunderson
2 siblings, 0 replies; 9+ messages in thread
From: Dave Taht @ 2016-06-13 4:34 UTC (permalink / raw)
To: Benjamin Cronce; +Cc: Jesper Louis Andersen, bloat
On Sun, Jun 12, 2016 at 6:07 PM, Benjamin Cronce <bcronce@gmail.com> wrote:
> I'm not arguing the an AQM isn't useful, just that you can't have your cake
> and eat it to. I wouldn't spend much time going down this path without first
> talking to someone with a strong background in ASICs for network switches
> and asking if it's even a feasible. Everything(very little) I know about
> ASICs and buffers is buffering is a very hard problem that east up a lot of
> transistors and more transistors means slower ASICs. Almost always a
> trade-off between performance and buffer size. CoDel and especially Cake
> sound like they would not be ASIC friendly.
For much of two years I shopped around trying to get a basic proof of
concept done in a FPGA, going so far as to help fund onenetswitch's
softswitch, and trying to line up a university or other lab to tackle
the logic (given my predilection for wanting things to be open
source). Several shops were very enthusiastic for a while... then went
dark.
...
Codel would be very ASIC friendly - all you need to do is prepend a
timestamp to the packet on input, and transport it across the internal
switch fabric. Getting the timestamp on ingress is essentially free.
Making the drop or mark decision is also cheap. Looping, as codel can
do on overload, not so much, but it isn't that necessary either.
As for the "fq" portion - proofs of concept already exist for DRR in
the netfpga.org's project's verilog
https://github.com/NetFPGA/netfpga/wiki/DRRNetFPGA
Where the dpi required to create the hash would have to end up at the
end of the packet, and thus get bypassed by cut-through switching
(when the output port was empty), and need at least one packet queued
up at the destination to be able to slide it into the right virtual
queue.
The stumbling blocks were:
A) that most shops were only interested in producing the next 100gps
or faster chip, rather than trying to address 10GigE and lower. And
were also intimidated by the big players in the market. Even the big
players are intimidated - broadcom just exited the wifi biz (selling
off their business to cypress) - as one example.
B) The potential revenues from producing a revolutionary smart "gbit"
switch chip using technologies that could just forward packets dumbly
at 10gbit was a lot of NRE for chips that sell for well under a few
dollars. Everybody has a gbit switch chip and ethernet chip already
paid for....
C) The building blocks for packet processing in hardware are hard to
license together except for a very few players, and what's out there
in open source, basically limited to opencore's and netfpga's work.
I have no doubt that eventually someone will produce implementations
of codel, fq_codel and/or cake algorithms in gates, tho I lean more
towards something qfq derived than drr derived. Best I could estimate
was that a DRR version would need at least a 3 packet "pipeline" to
hash and route, tho I thought some interesting things could be done by
also having multiple CAMs to route the different flows and be able to
handle cache misses better.
In the interim, I kind of expect stuff derived from the QCA or
caviums' specialized co-processors to gain some of these ideas. There
are also hugely parallel network processors around the corner based on
arm architectures in cavium's and amd's roadmap. Cisco has some
insanely parallel processors in their designs.
More than fixing the native DC switch market (where everybody
generally overprovisions anyway) or improving consumer switches, I'd
felt that ISP edge devices (dslams/cable) were where someone would
innovate with a hardware assist to meet their business models (
http://jvimal.github.io/senic/ ), and maybe we'd see some assistance
for inbound rate limiting also arrive in consumer hardware with
offloads, a more limited version of senic perhaps.
But it takes a long time to develop hardware, chip designers scarce,
and without clear market demand... I dunno, who knows? perhaps what
the OCP folk are doing will start feeding back into actual chip
designs one day.
It was a nice diversion for me to play with the chisel language, the
risc-v and mill cpus , at least.
Anybody up for repurposing some machine learning chips? These look like fun:
https://www.engadget.com/2016/04/28/movidius-fathom-neural-compute-stick/
Oh, yea, mellonox's latest programmable ethernet devices look promising also.
http://www.mellanox.com/page/programmable_network_adapters
The ironic thing is that the biggest problems in 10GigE+ is on input,
not output. In fact, on much hardware, even at lower rates,we tend to
be dropping on input more than enough to balance out the potential
bufferbloat problems there. Moving the timestamp and hash and having
parallel memory channels is sort of happening on multiple newer chips
on the rx path...
> On Sun , Jun 12, 2016 at 5:01 PM, Jesper Louis Andersen
> <jesper.louis.andersen@gmail.com> wrote:
>>
>> This *is* commonly a problem. Look up "TCP incast".
>>
>> The scenario is exactly as you describe. A distributed database makes
>> queries over the same switch to K other nodes in order to verify the
>> integrity of the answer. Data is served from memory and thus access times
>> are roughly the same on all the K nodes. If the data response is sizable,
>> then the switch output port is overwhelmed with traffic, and it drops
>> packets. TCPs congestion algorithm gets into play.
>>
>> It is almost like resonance in engineering. At the wrong "frequency", the
>> bridge/switch will resonate and make everything go haywire.
>>
>>
>> On Sun, Jun 12, 2016 at 11:24 PM, Steinar H. Gunderson
>> <sgunderson@bigfoot.com> wrote:
>>>
>>> On Sun, Jun 12, 2016 at 01:25:17PM -0500, Benjamin Cronce wrote:
>>> > Internal networks rarely have bandwidth issues and congestion only
>>> > happens
>>> > when you don't have enough bandwidth.
>>>
>>> I don't think this is true. You might not have an aggregate bandwidth
>>> issues,
>>> but given the burstiness of TCP and the typical switch buffer depth
>>> (64 frames is a typical number), it's very very easy to lose packets in
>>> your
>>> switch even on a relatively quiet network with no downconversion.
>>> (Witness
>>> the rise of DCTCP, made especially for internal traffic on this kind of
>>> network.)
>>>
>>> /* Steinar */
>>> --
>>> Homepage: https://www.sesse.net/
>>> _______________________________________________
>>> Bloat mailing list
>>> Bloat@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/bloat
>>
>>
>>
>>
>> --
>> J.
>>
>> _______________________________________________
>> Bloat mailing list
>> Bloat@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>>
>
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
--
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Questions About Switch Buffering
2016-06-13 1:07 ` Benjamin Cronce
2016-06-13 1:50 ` Jonathan Morton
2016-06-13 4:34 ` Dave Taht
@ 2016-06-13 8:23 ` Steinar H. Gunderson
2 siblings, 0 replies; 9+ messages in thread
From: Steinar H. Gunderson @ 2016-06-13 8:23 UTC (permalink / raw)
To: Benjamin Cronce; +Cc: Jesper Louis Andersen, bloat
On Sun, Jun 12, 2016 at 08:07:42PM -0500, Benjamin Cronce wrote:
> CoDel and especially Cake sound like they would not be ASIC friendly.
I thought this was what PIE was supposed to be about? But yes, it's a hard
problem.
/* Steinar */
--
Homepage: https://www.sesse.net/
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-06-13 8:23 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-12 17:44 [Bloat] Questions About Switch Buffering Noah Causin
2016-06-12 17:48 ` Steinar H. Gunderson
2016-06-12 18:25 ` Benjamin Cronce
2016-06-12 21:24 ` Steinar H. Gunderson
2016-06-12 22:01 ` Jesper Louis Andersen
2016-06-13 1:07 ` Benjamin Cronce
2016-06-13 1:50 ` Jonathan Morton
2016-06-13 4:34 ` Dave Taht
2016-06-13 8:23 ` Steinar H. Gunderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox