* [Bloat] Questions About Switch Buffering @ 2016-06-12 17:44 Noah Causin 2016-06-12 17:48 ` Steinar H. Gunderson 2016-06-12 18:25 ` Benjamin Cronce 0 siblings, 2 replies; 9+ messages in thread From: Noah Causin @ 2016-06-12 17:44 UTC (permalink / raw) To: bloat I have some questions about switch buffering. Are there any good switches that have modern AQMs in them like fq_codel? Also, If a home router has a built-in switch, is the buffering controlled by the AQM, or do the switches have their own internal buffering that takes precedence? The scenario I am thinking of is two ports trying to feed data out of a single port on a switch. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Questions About Switch Buffering 2016-06-12 17:44 [Bloat] Questions About Switch Buffering Noah Causin @ 2016-06-12 17:48 ` Steinar H. Gunderson 2016-06-12 18:25 ` Benjamin Cronce 1 sibling, 0 replies; 9+ messages in thread From: Steinar H. Gunderson @ 2016-06-12 17:48 UTC (permalink / raw) To: bloat On Sun, Jun 12, 2016 at 01:44:52PM -0400, Noah Causin wrote: > Are there any good switches that have modern AQMs in them like fq_codel? No. > Also, If a home router has a built-in switch, is the buffering controlled by > the AQM, or do the switches have their own internal buffering that takes > precedence? The latter. > The scenario I am thinking of is two ports trying to feed data out of a > single port on a switch. Yes, this is an unsolved problem. :-) /* Steinar */ -- Homepage: https://www.sesse.net/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Questions About Switch Buffering 2016-06-12 17:44 [Bloat] Questions About Switch Buffering Noah Causin 2016-06-12 17:48 ` Steinar H. Gunderson @ 2016-06-12 18:25 ` Benjamin Cronce 2016-06-12 21:24 ` Steinar H. Gunderson 1 sibling, 1 reply; 9+ messages in thread From: Benjamin Cronce @ 2016-06-12 18:25 UTC (permalink / raw) To: Noah Causin; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 2091 bytes --] Routers and firewalls are common to have AQMs because they mostly deal with a high to low bandwidth transition from LAN to WAN. Internal networks rarely have bandwidth issues and congestion only happens when you don't have enough bandwidth. LANs are relatively easy to increase bandwidth. Either by binding ports or purchasing 10Gb or 40Gb links. If you have a 1Gb link from your LAN to your router and it's getting bloat issues, I recommend purchasing a 10Gb uplink to your switch. AQMs are difficult to do at line rate, even in hardware, and it will almost always be the case that a faster port is cheaper and better than implementing an AQM in the switch. One paper that I read was saying our rate of bandwidth is increasing much faster than our ability to process packets. Moving data is relatively easy compared to doing branching logic against the data. The paper went one to review several 400Gb ports, most of which actually were capable of doing near line rate 400Gb, but even the best one, once you enabled a simple strict priority QoS nose-dived down to 150Gb. It's becoming a physics issue. In order to do complex logic, you need more transistors, and that is at odds with moving the data faster through the system. At high link speeds in the future, think 1Tb+, QoS may have to go away unless we find some sort of photonic processing breakthrough. The good news is it seems like there's no ceiling on the amount of bandwidth we can push over fiber. On Sun, Jun 12, 2016 at 12:44 PM, Noah Causin <n0manletter@gmail.com> wrote: > I have some questions about switch buffering. > > Are there any good switches that have modern AQMs in them like fq_codel? > > Also, If a home router has a built-in switch, is the buffering controlled > by the AQM, or do the switches have their own internal buffering that takes > precedence? > > The scenario I am thinking of is two ports trying to feed data out of a > single port on a switch. > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > [-- Attachment #2: Type: text/html, Size: 2636 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Questions About Switch Buffering 2016-06-12 18:25 ` Benjamin Cronce @ 2016-06-12 21:24 ` Steinar H. Gunderson 2016-06-12 22:01 ` Jesper Louis Andersen 0 siblings, 1 reply; 9+ messages in thread From: Steinar H. Gunderson @ 2016-06-12 21:24 UTC (permalink / raw) To: bloat On Sun, Jun 12, 2016 at 01:25:17PM -0500, Benjamin Cronce wrote: > Internal networks rarely have bandwidth issues and congestion only happens > when you don't have enough bandwidth. I don't think this is true. You might not have an aggregate bandwidth issues, but given the burstiness of TCP and the typical switch buffer depth (64 frames is a typical number), it's very very easy to lose packets in your switch even on a relatively quiet network with no downconversion. (Witness the rise of DCTCP, made especially for internal traffic on this kind of network.) /* Steinar */ -- Homepage: https://www.sesse.net/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Questions About Switch Buffering 2016-06-12 21:24 ` Steinar H. Gunderson @ 2016-06-12 22:01 ` Jesper Louis Andersen 2016-06-13 1:07 ` Benjamin Cronce 0 siblings, 1 reply; 9+ messages in thread From: Jesper Louis Andersen @ 2016-06-12 22:01 UTC (permalink / raw) To: Steinar H. Gunderson; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 1493 bytes --] This *is* commonly a problem. Look up "TCP incast". The scenario is exactly as you describe. A distributed database makes queries over the same switch to K other nodes in order to verify the integrity of the answer. Data is served from memory and thus access times are roughly the same on all the K nodes. If the data response is sizable, then the switch output port is overwhelmed with traffic, and it drops packets. TCPs congestion algorithm gets into play. It is almost like resonance in engineering. At the wrong "frequency", the bridge/switch will resonate and make everything go haywire. On Sun, Jun 12, 2016 at 11:24 PM, Steinar H. Gunderson < sgunderson@bigfoot.com> wrote: > On Sun, Jun 12, 2016 at 01:25:17PM -0500, Benjamin Cronce wrote: > > Internal networks rarely have bandwidth issues and congestion only > happens > > when you don't have enough bandwidth. > > I don't think this is true. You might not have an aggregate bandwidth > issues, > but given the burstiness of TCP and the typical switch buffer depth > (64 frames is a typical number), it's very very easy to lose packets in > your > switch even on a relatively quiet network with no downconversion. (Witness > the rise of DCTCP, made especially for internal traffic on this kind of > network.) > > /* Steinar */ > -- > Homepage: https://www.sesse.net/ > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > -- J. [-- Attachment #2: Type: text/html, Size: 2349 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Questions About Switch Buffering 2016-06-12 22:01 ` Jesper Louis Andersen @ 2016-06-13 1:07 ` Benjamin Cronce 2016-06-13 1:50 ` Jonathan Morton ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Benjamin Cronce @ 2016-06-13 1:07 UTC (permalink / raw) To: Jesper Louis Andersen; +Cc: Steinar H. Gunderson, bloat [-- Attachment #1: Type: text/plain, Size: 2363 bytes --] I'm not arguing the an AQM isn't useful, just that you can't have your cake and eat it to. I wouldn't spend much time going down this path without first talking to someone with a strong background in ASICs for network switches and asking if it's even a feasible. Everything(very little) I know about ASICs and buffers is buffering is a very hard problem that east up a lot of transistors and more transistors means slower ASICs. Almost always a trade-off between performance and buffer size. CoDel and especially Cake sound like they would not be ASIC friendly. On Sun, Jun 12, 2016 at 5:01 PM, Jesper Louis Andersen < jesper.louis.andersen@gmail.com> wrote: > This *is* commonly a problem. Look up "TCP incast". > > The scenario is exactly as you describe. A distributed database makes > queries over the same switch to K other nodes in order to verify the > integrity of the answer. Data is served from memory and thus access times > are roughly the same on all the K nodes. If the data response is sizable, > then the switch output port is overwhelmed with traffic, and it drops > packets. TCPs congestion algorithm gets into play. > > It is almost like resonance in engineering. At the wrong "frequency", the > bridge/switch will resonate and make everything go haywire. > > > On Sun, Jun 12, 2016 at 11:24 PM, Steinar H. Gunderson < > sgunderson@bigfoot.com> wrote: > >> On Sun, Jun 12, 2016 at 01:25:17PM -0500, Benjamin Cronce wrote: >> > Internal networks rarely have bandwidth issues and congestion only >> happens >> > when you don't have enough bandwidth. >> >> I don't think this is true. You might not have an aggregate bandwidth >> issues, >> but given the burstiness of TCP and the typical switch buffer depth >> (64 frames is a typical number), it's very very easy to lose packets in >> your >> switch even on a relatively quiet network with no downconversion. (Witness >> the rise of DCTCP, made especially for internal traffic on this kind of >> network.) >> >> /* Steinar */ >> -- >> Homepage: https://www.sesse.net/ >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat >> > > > > -- > J. > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > > [-- Attachment #2: Type: text/html, Size: 3673 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Questions About Switch Buffering 2016-06-13 1:07 ` Benjamin Cronce @ 2016-06-13 1:50 ` Jonathan Morton 2016-06-13 4:34 ` Dave Taht 2016-06-13 8:23 ` Steinar H. Gunderson 2 siblings, 0 replies; 9+ messages in thread From: Jonathan Morton @ 2016-06-13 1:50 UTC (permalink / raw) To: Benjamin Cronce; +Cc: Jesper Louis Andersen, bloat [-- Attachment #1: Type: text/plain, Size: 328 bytes --] I think at switch level we should be thinking of simple AQM rather than the flow isolating type. Just a simple way to inform the endpoints that some congestion is occurring so please flood sparingly. As previously noted, WRED is considerably better than nothing, even if it's not very good in absolute terms. - Jonathan Morton [-- Attachment #2: Type: text/html, Size: 374 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Questions About Switch Buffering 2016-06-13 1:07 ` Benjamin Cronce 2016-06-13 1:50 ` Jonathan Morton @ 2016-06-13 4:34 ` Dave Taht 2016-06-13 8:23 ` Steinar H. Gunderson 2 siblings, 0 replies; 9+ messages in thread From: Dave Taht @ 2016-06-13 4:34 UTC (permalink / raw) To: Benjamin Cronce; +Cc: Jesper Louis Andersen, bloat On Sun, Jun 12, 2016 at 6:07 PM, Benjamin Cronce <bcronce@gmail.com> wrote: > I'm not arguing the an AQM isn't useful, just that you can't have your cake > and eat it to. I wouldn't spend much time going down this path without first > talking to someone with a strong background in ASICs for network switches > and asking if it's even a feasible. Everything(very little) I know about > ASICs and buffers is buffering is a very hard problem that east up a lot of > transistors and more transistors means slower ASICs. Almost always a > trade-off between performance and buffer size. CoDel and especially Cake > sound like they would not be ASIC friendly. For much of two years I shopped around trying to get a basic proof of concept done in a FPGA, going so far as to help fund onenetswitch's softswitch, and trying to line up a university or other lab to tackle the logic (given my predilection for wanting things to be open source). Several shops were very enthusiastic for a while... then went dark. ... Codel would be very ASIC friendly - all you need to do is prepend a timestamp to the packet on input, and transport it across the internal switch fabric. Getting the timestamp on ingress is essentially free. Making the drop or mark decision is also cheap. Looping, as codel can do on overload, not so much, but it isn't that necessary either. As for the "fq" portion - proofs of concept already exist for DRR in the netfpga.org's project's verilog https://github.com/NetFPGA/netfpga/wiki/DRRNetFPGA Where the dpi required to create the hash would have to end up at the end of the packet, and thus get bypassed by cut-through switching (when the output port was empty), and need at least one packet queued up at the destination to be able to slide it into the right virtual queue. The stumbling blocks were: A) that most shops were only interested in producing the next 100gps or faster chip, rather than trying to address 10GigE and lower. And were also intimidated by the big players in the market. Even the big players are intimidated - broadcom just exited the wifi biz (selling off their business to cypress) - as one example. B) The potential revenues from producing a revolutionary smart "gbit" switch chip using technologies that could just forward packets dumbly at 10gbit was a lot of NRE for chips that sell for well under a few dollars. Everybody has a gbit switch chip and ethernet chip already paid for.... C) The building blocks for packet processing in hardware are hard to license together except for a very few players, and what's out there in open source, basically limited to opencore's and netfpga's work. I have no doubt that eventually someone will produce implementations of codel, fq_codel and/or cake algorithms in gates, tho I lean more towards something qfq derived than drr derived. Best I could estimate was that a DRR version would need at least a 3 packet "pipeline" to hash and route, tho I thought some interesting things could be done by also having multiple CAMs to route the different flows and be able to handle cache misses better. In the interim, I kind of expect stuff derived from the QCA or caviums' specialized co-processors to gain some of these ideas. There are also hugely parallel network processors around the corner based on arm architectures in cavium's and amd's roadmap. Cisco has some insanely parallel processors in their designs. More than fixing the native DC switch market (where everybody generally overprovisions anyway) or improving consumer switches, I'd felt that ISP edge devices (dslams/cable) were where someone would innovate with a hardware assist to meet their business models ( http://jvimal.github.io/senic/ ), and maybe we'd see some assistance for inbound rate limiting also arrive in consumer hardware with offloads, a more limited version of senic perhaps. But it takes a long time to develop hardware, chip designers scarce, and without clear market demand... I dunno, who knows? perhaps what the OCP folk are doing will start feeding back into actual chip designs one day. It was a nice diversion for me to play with the chisel language, the risc-v and mill cpus , at least. Anybody up for repurposing some machine learning chips? These look like fun: https://www.engadget.com/2016/04/28/movidius-fathom-neural-compute-stick/ Oh, yea, mellonox's latest programmable ethernet devices look promising also. http://www.mellanox.com/page/programmable_network_adapters The ironic thing is that the biggest problems in 10GigE+ is on input, not output. In fact, on much hardware, even at lower rates,we tend to be dropping on input more than enough to balance out the potential bufferbloat problems there. Moving the timestamp and hash and having parallel memory channels is sort of happening on multiple newer chips on the rx path... > On Sun , Jun 12, 2016 at 5:01 PM, Jesper Louis Andersen > <jesper.louis.andersen@gmail.com> wrote: >> >> This *is* commonly a problem. Look up "TCP incast". >> >> The scenario is exactly as you describe. A distributed database makes >> queries over the same switch to K other nodes in order to verify the >> integrity of the answer. Data is served from memory and thus access times >> are roughly the same on all the K nodes. If the data response is sizable, >> then the switch output port is overwhelmed with traffic, and it drops >> packets. TCPs congestion algorithm gets into play. >> >> It is almost like resonance in engineering. At the wrong "frequency", the >> bridge/switch will resonate and make everything go haywire. >> >> >> On Sun, Jun 12, 2016 at 11:24 PM, Steinar H. Gunderson >> <sgunderson@bigfoot.com> wrote: >>> >>> On Sun, Jun 12, 2016 at 01:25:17PM -0500, Benjamin Cronce wrote: >>> > Internal networks rarely have bandwidth issues and congestion only >>> > happens >>> > when you don't have enough bandwidth. >>> >>> I don't think this is true. You might not have an aggregate bandwidth >>> issues, >>> but given the burstiness of TCP and the typical switch buffer depth >>> (64 frames is a typical number), it's very very easy to lose packets in >>> your >>> switch even on a relatively quiet network with no downconversion. >>> (Witness >>> the rise of DCTCP, made especially for internal traffic on this kind of >>> network.) >>> >>> /* Steinar */ >>> -- >>> Homepage: https://www.sesse.net/ >>> _______________________________________________ >>> Bloat mailing list >>> Bloat@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/bloat >> >> >> >> >> -- >> J. >> >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat >> > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > -- Dave Täht Let's go make home routers and wifi faster! With better software! http://blog.cerowrt.org ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Questions About Switch Buffering 2016-06-13 1:07 ` Benjamin Cronce 2016-06-13 1:50 ` Jonathan Morton 2016-06-13 4:34 ` Dave Taht @ 2016-06-13 8:23 ` Steinar H. Gunderson 2 siblings, 0 replies; 9+ messages in thread From: Steinar H. Gunderson @ 2016-06-13 8:23 UTC (permalink / raw) To: Benjamin Cronce; +Cc: Jesper Louis Andersen, bloat On Sun, Jun 12, 2016 at 08:07:42PM -0500, Benjamin Cronce wrote: > CoDel and especially Cake sound like they would not be ASIC friendly. I thought this was what PIE was supposed to be about? But yes, it's a hard problem. /* Steinar */ -- Homepage: https://www.sesse.net/ ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-06-13 8:23 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-06-12 17:44 [Bloat] Questions About Switch Buffering Noah Causin 2016-06-12 17:48 ` Steinar H. Gunderson 2016-06-12 18:25 ` Benjamin Cronce 2016-06-12 21:24 ` Steinar H. Gunderson 2016-06-12 22:01 ` Jesper Louis Andersen 2016-06-13 1:07 ` Benjamin Cronce 2016-06-13 1:50 ` Jonathan Morton 2016-06-13 4:34 ` Dave Taht 2016-06-13 8:23 ` Steinar H. Gunderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox