* Re: [Starlink] FQ_Codel [not found] <mailman.63.1654706837.1281.starlink@lists.bufferbloat.net> @ 2022-06-08 18:47 ` David P. Reed 2022-06-08 19:12 ` warren ponder 2022-06-09 8:50 ` Sebastian Moeller 0 siblings, 2 replies; 12+ messages in thread From: David P. Reed @ 2022-06-08 18:47 UTC (permalink / raw) To: starlink [-- Attachment #1: Type: text/plain, Size: 2546 bytes --] I'm just going to remind folks that fixing bufferbloat in Starlink won't be possible with FQ-Codel in the CPE equipment. If that were possible, it could be fixed entirely in a box sitting between the dishy and the user's "home network". Evidence exists that the bulk of the "bloat" can exist, not just in the dishy, but also in the central "access point" where satellites in a coverage region direct all the traffic from and to the public Internet. This connection from the region becomes bloated if the inbound link and outbound link become "underprovisioned" for peak rates of all the served dishy terminals. That public-Internet to StarLink access point (is there a more distinct, precise name) can develop a very long delay queue. For the same reason that bufferbloat always gets designed in - memory is cheap and plentiful, so instead of dropping packets to minimize latency, the link just stores packets until multiple seconds worth of traffic build up on one or both ends of that link. This problem can be solved only by dropping packets (with packet drop rate mitigated by ECN-marking) to match the desired round-trip latency across the entire Internet. Typically, this queue size should max out and start dropping packets at about 2 * cross-Internet desired latency * bit-rate of this link. Cross-Internet desired latency can be selected these days by using light-speed in fiber between one side of the North American continent and the other - around 15 msec. is appropriate. (which should be the worst case end-to-end latency observed using Starlink, and is around the 20 msec. number bandied about by Musk - though he really never understood what end-to-end latency means). Now it may be that the dishy itself also has such bloat built in, which would make FQ-Codel in the dishy also important. The problem called bufferbloat occurs whenever ANY router on ANY end-to-end shared path allows such queueing delay to accumulate before shortening the queue. It really frustrates me that memory keeps being added to router outbound buffers anywhere. And it may be that the reason is that almost nobody who designs packet forwarding systems understands Queueing Theory at all! It certainly doesn't help that "packet drops" (even one or two per second) are considered a failure of the equipment. FQ-codel is great, but why it works is that it makes the choice of what packet to drop far better (by being fair and a little bit elastic). However, the lack of FQ-Codel doesn't fix system-level bufferbloat. [-- Attachment #2: Type: text/html, Size: 4432 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Starlink] FQ_Codel 2022-06-08 18:47 ` [Starlink] FQ_Codel David P. Reed @ 2022-06-08 19:12 ` warren ponder 2022-06-08 20:49 ` David P. Reed 2022-06-09 0:12 ` Stuart Cheshire 2022-06-09 8:50 ` Sebastian Moeller 1 sibling, 2 replies; 12+ messages in thread From: warren ponder @ 2022-06-08 19:12 UTC (permalink / raw) To: David P. Reed; +Cc: starlink [-- Attachment #1: Type: text/plain, Size: 3046 bytes --] So this is really helpful. Is it fair to say then that end users with SQM and fq_codel on a Starlink connection should essentially not turn on SQM.and.just leave it off? On Wed, Jun 8, 2022, 11:47 AM David P. Reed <dpreed@deepplum.com> wrote: > I'm just going to remind folks that fixing bufferbloat in Starlink won't > be possible with FQ-Codel in the CPE equipment. If that were possible, it > could be fixed entirely in a box sitting between the dishy and the user's > "home network". > > > > Evidence exists that the bulk of the "bloat" can exist, not just in the > dishy, but also in the central "access point" where satellites in a > coverage region direct all the traffic from and to the public Internet. > This connection from the region becomes bloated if the inbound link and > outbound link become "underprovisioned" for peak rates of all the served > dishy terminals. > > That public-Internet to StarLink access point (is there a more distinct, > precise name) can develop a very long delay queue. For the same reason > that bufferbloat always gets designed in - memory is cheap and plentiful, > so instead of dropping packets to minimize latency, the link just stores > packets until multiple seconds worth of traffic build up on one or both > ends of that link. > > > > This problem can be solved only by dropping packets (with packet drop rate > mitigated by ECN-marking) to match the desired round-trip latency across > the entire Internet. Typically, this queue size should max out and start > dropping packets at about 2 * cross-Internet desired latency * bit-rate of > this link. > > Cross-Internet desired latency can be selected these days by using > light-speed in fiber between one side of the North American continent and > the other - around 15 msec. is appropriate. (which should be the worst case > end-to-end latency observed using Starlink, and is around the 20 msec. > number bandied about by Musk - though he really never understood what > end-to-end latency means). > > > > > > Now it may be that the dishy itself also has such bloat built in, which > would make FQ-Codel in the dishy also important. > > > > The problem called bufferbloat occurs whenever ANY router on ANY > end-to-end shared path allows such queueing delay to accumulate before > shortening the queue. > > > > It really frustrates me that memory keeps being added to router outbound > buffers anywhere. And it may be that the reason is that almost nobody who > designs packet forwarding systems understands Queueing Theory at all! It > certainly doesn't help that "packet drops" (even one or two per second) are > considered a failure of the equipment. > > > > FQ-codel is great, but why it works is that it makes the choice of what > packet to drop far better (by being fair and a little bit elastic). > However, the lack of FQ-Codel doesn't fix system-level bufferbloat. > > > > > _______________________________________________ > Starlink mailing list > Starlink@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/starlink > [-- Attachment #2: Type: text/html, Size: 4843 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Starlink] FQ_Codel 2022-06-08 19:12 ` warren ponder @ 2022-06-08 20:49 ` David P. Reed 2022-06-08 21:30 ` Dave Taht 2022-06-09 8:58 ` Sebastian Moeller 2022-06-09 0:12 ` Stuart Cheshire 1 sibling, 2 replies; 12+ messages in thread From: David P. Reed @ 2022-06-08 20:49 UTC (permalink / raw) To: warren ponder; +Cc: starlink [-- Attachment #1: Type: text/plain, Size: 4274 bytes --] No, I don't think so. However, a hidden (not often discussed) way that using FQ-codel in a home router works is that you have to artificially restrict the total bitrate in both directions used by your home router to talk to the access provider link. Typically, it is recommended to use 95% of the upload/download speeds of that link as the limit. This forces packets to be dropped when the constraint is exceeded. Now this forces congestion control signals (dropped packets) to be observed by both ends. (In a cable DOCSIS system, this allows the edge to manage the throughput of the CMTS for the local endpoint, because the CMTS won't drop packets when it should - because configuring DOCSIS 3.1 CMTS's is often done in a way that causes bufferbloat in CMTS. DOCSIS 2 always had bufferbloat in the CMTS). Starlink doesn't sell you a stable "max rate" - instead that rate varies depending on traffic, and can't be easily measured. So to configure the dishy or an edge router connected to it correctly, you need to enforce such a limit such that it actually causes FQ-codel to see dropped packets. On Wednesday, June 8, 2022 3:12pm, "warren ponder" <wponder11@gmail.com> said: So this is really helpful. Is it fair to say then that end users with SQM and fq_codel on a Starlink connection should essentially not turn on SQM.and.just leave it off? On Wed, Jun 8, 2022, 11:47 AM David P. Reed <[ dpreed@deepplum.com ]( mailto:dpreed@deepplum.com )> wrote: I'm just going to remind folks that fixing bufferbloat in Starlink won't be possible with FQ-Codel in the CPE equipment. If that were possible, it could be fixed entirely in a box sitting between the dishy and the user's "home network". Evidence exists that the bulk of the "bloat" can exist, not just in the dishy, but also in the central "access point" where satellites in a coverage region direct all the traffic from and to the public Internet. This connection from the region becomes bloated if the inbound link and outbound link become "underprovisioned" for peak rates of all the served dishy terminals. That public-Internet to StarLink access point (is there a more distinct, precise name) can develop a very long delay queue. For the same reason that bufferbloat always gets designed in - memory is cheap and plentiful, so instead of dropping packets to minimize latency, the link just stores packets until multiple seconds worth of traffic build up on one or both ends of that link. This problem can be solved only by dropping packets (with packet drop rate mitigated by ECN-marking) to match the desired round-trip latency across the entire Internet. Typically, this queue size should max out and start dropping packets at about 2 * cross-Internet desired latency * bit-rate of this link. Cross-Internet desired latency can be selected these days by using light-speed in fiber between one side of the North American continent and the other - around 15 msec. is appropriate. (which should be the worst case end-to-end latency observed using Starlink, and is around the 20 msec. number bandied about by Musk - though he really never understood what end-to-end latency means). Now it may be that the dishy itself also has such bloat built in, which would make FQ-Codel in the dishy also important. The problem called bufferbloat occurs whenever ANY router on ANY end-to-end shared path allows such queueing delay to accumulate before shortening the queue. It really frustrates me that memory keeps being added to router outbound buffers anywhere. And it may be that the reason is that almost nobody who designs packet forwarding systems understands Queueing Theory at all! It certainly doesn't help that "packet drops" (even one or two per second) are considered a failure of the equipment. FQ-codel is great, but why it works is that it makes the choice of what packet to drop far better (by being fair and a little bit elastic). However, the lack of FQ-Codel doesn't fix system-level bufferbloat. _______________________________________________ Starlink mailing list [ Starlink@lists.bufferbloat.net ]( mailto:Starlink@lists.bufferbloat.net ) [ https://lists.bufferbloat.net/listinfo/starlink ]( https://lists.bufferbloat.net/listinfo/starlink ) [-- Attachment #2: Type: text/html, Size: 7189 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Starlink] FQ_Codel 2022-06-08 20:49 ` David P. Reed @ 2022-06-08 21:30 ` Dave Taht 2022-06-09 8:58 ` Sebastian Moeller 1 sibling, 0 replies; 12+ messages in thread From: Dave Taht @ 2022-06-08 21:30 UTC (permalink / raw) To: David P. Reed; +Cc: warren ponder, starlink On Wed, Jun 8, 2022 at 1:49 PM David P. Reed <dpreed@deepplum.com> wrote: > > No, I don't think so. However, a hidden (not often discussed) way that using FQ-codel in a home router works is that you have to artificially restrict the total bitrate in both directions used by your home router to talk to the access provider link. > > > > Typically, it is recommended to use 95% of the upload/download speeds of that link as the limit. This forces packets to be dropped when the constraint is exceeded. Now this forces congestion control signals (dropped packets) to be observed by both ends. (In a cable DOCSIS system, this allows the edge to manage the throughput of the CMTS for the local endpoint, because the CMTS won't drop packets when it should - because configuring DOCSIS 3.1 CMTS's is often done in a way that causes bufferbloat in CMTS. DOCSIS 2 always had bufferbloat in the CMTS). > > > > Starlink doesn't sell you a stable "max rate" - instead that rate varies depending on traffic, and can't be easily measured. I appreciate david simplifying the problem here, but the details are: On egress... At line rate... ethernet backpressure is provided via the linux BQL facility ( https://lwn.net/Articles/469652/ ) which basically buffers up one completion interrupt's worth of bytes (packets), and punting the complicated FQ and codel drop/mark decisions to a slightly higher layer. This is typically 3k bytes at 100Mbit, 40k to 128k) (With TSO) at a gbit. There's roughly 1/2 a ms of needed buffering at the lowest layer on intel chips today. Some arm chips are capable of doing interrupts quite a bit faster than intel, 100us is feasible on some. Codel, being "time in queue based" also works to drop packets intelligently with ethernet pause frames giving you a "variable rate" link. I'm not big on pause frames, but codel (and pie) can work with them, where RED cannot. We long ago succeeded at making a plethora of the very variable rate wifi devices work (in the driver) by adopting a motto of "one TXOP in the hardware", "one ready to go", for wifi, leading to a max unmanaged buffering of about 10ms before a smarter qdisc can kick in. The biggest problem that wifi had (that I hope starlink doesn't!) was that wifi packet aggregation is needed to get even close to the rated bandwidth, and that with a fifo, rather than a per station queue, performance degrades hugely when flows for more than one station are active. If only I could get a few million more people to lose 8 minutes of their life to this simple and elegant demo of why wifi goes to hell: https://www.youtube.com/watch?v=Rb-UnHDw02o&t=1550s Or read up on how well it's solved as of 2016, and use our test suite for multiple stations at the same time: https://www.cs.kau.se/tohojo/airtime-fairness/ In both cases with smarter hardware all that could be eliminated, but I digress. 10ms worst case latency for a sparse flow is much better than the 7 seconds I first observed on the first wifi router starlink shipped. I apologize for conflating these things - the principal wireless gear I'm familiar with is wifi, and a few other TDM style wireless macs. The principal problem the starlink network has is on the satellite uplink, followed by the satellite downlink, and the wifi router problems only show up if you use the router for local things. The wifi solution we came up with seems generalizable to most forms of wireless, including LTE/5G and starlink. > So to configure the dishy or an edge router connected to it correctly, you need to enforce such a limit such that it actually causes FQ-codel to see dropped packets. So to reiterate, for egress from the client up to the sat: 1) with an interrupts worth of backpressure from the radio, we just slam cake, fq_codel, or fq_pie on it, and we have a minimum inter-flow latency of however long that takes (under a ms) for sparse flows, and a buffering target of 5ms with margin of 100ms for fat flows. (My guess is that the radio is actually scheduled on intervals higher than that btw) or 2) actual (and accurate) bandwidth stats from the radio as to how much can fit into the next transmit opportunity, we can set a cake rate slightly below that. The 24Mbit up we've seen would have oh, a 2% cpu impact on these arm chips running cake with all options enabled. With the shaper, call it 7% Ingress is harder, ideally, as I've said, they'd be managing the queue depth and fq at the head end, and not in the sky, doing complicated things like per-customer fairness, etc, there, and doing just enough queuing in the sky to cover for 2 tx opportunities. The approach we've long taken with shaping ingress also at the customer router, well, that's WIP for variable rate links... > > On Wednesday, June 8, 2022 3:12pm, "warren ponder" <wponder11@gmail.com> said: > > So this is really helpful. Is it fair to say then that end users with SQM and fq_codel on a Starlink connection should essentially not turn on SQM.and.just leave it off? Right now, aside from trying to line up more testers of the autorate code, it's not worth doing, yes. > On Wed, Jun 8, 2022, 11:47 AM David P. Reed <dpreed@deepplum.com> wrote: >> >> I'm just going to remind folks that fixing bufferbloat in Starlink won't be possible with FQ-Codel in the CPE equipment. If that were possible, it could be fixed entirely in a box sitting between the dishy and the user's "home network". >> >> >> >> Evidence exists that the bulk of the "bloat" can exist, not just in the dishy, but also in the central "access point" where satellites in a coverage region direct all the traffic from and to the public Internet. This connection from the region becomes bloated if the inbound link and outbound link become "underprovisioned" for peak rates of all the served dishy terminals. >> >> That public-Internet to StarLink access point (is there a more distinct, precise name) can develop a very long delay queue. For the same reason that bufferbloat always gets designed in - memory is cheap and plentiful, so instead of dropping packets to minimize latency, the link just stores packets until multiple seconds worth of traffic build up on one or both ends of that link. >> >> >> >> This problem can be solved only by dropping packets (with packet drop rate mitigated by ECN-marking) to match the desired round-trip latency across the entire Internet. Typically, this queue size should max out and start dropping packets at about 2 * cross-Internet desired latency * bit-rate of this link. >> >> Cross-Internet desired latency can be selected these days by using light-speed in fiber between one side of the North American continent and the other - around 15 msec. is appropriate. (which should be the worst case end-to-end latency observed using Starlink, and is around the 20 msec. number bandied about by Musk - though he really never understood what end-to-end latency means). >> >> >> >> >> >> Now it may be that the dishy itself also has such bloat built in, which would make FQ-Codel in the dishy also important. >> >> >> >> The problem called bufferbloat occurs whenever ANY router on ANY end-to-end shared path allows such queueing delay to accumulate before shortening the queue. >> >> >> >> It really frustrates me that memory keeps being added to router outbound buffers anywhere. And it may be that the reason is that almost nobody who designs packet forwarding systems understands Queueing Theory at all! It certainly doesn't help that "packet drops" (even one or two per second) are considered a failure of the equipment. >> >> >> >> FQ-codel is great, but why it works is that it makes the choice of what packet to drop far better (by being fair and a little bit elastic). However, the lack of FQ-Codel doesn't fix system-level bufferbloat. >> >> >> >> >> >> _______________________________________________ >> Starlink mailing list >> Starlink@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/starlink > > _______________________________________________ > Starlink mailing list > Starlink@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/starlink -- FQ World Domination pending: https://blog.cerowrt.org/post/state_of_fq_codel/ Dave Täht CEO, TekLibre, LLC ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Starlink] FQ_Codel 2022-06-08 20:49 ` David P. Reed 2022-06-08 21:30 ` Dave Taht @ 2022-06-09 8:58 ` Sebastian Moeller 1 sibling, 0 replies; 12+ messages in thread From: Sebastian Moeller @ 2022-06-09 8:58 UTC (permalink / raw) To: David P. Reed; +Cc: warren ponder, starlink Hi David, > On Jun 8, 2022, at 22:49, David P. Reed <dpreed@deepplum.com> wrote: > > No, I don't think so. However, a hidden (not often discussed) way that using FQ-codel in a home router works is that you have to artificially restrict the total bitrate in both directions used by your home router to talk to the access provider link. I am not sure I fully agree with the "hidden (not often discussed)" qualification, when I explain sqm principles I start by explaining queues* and bottleneck queues and the need to create an artificial bottleneck to get the queueing under our control where we can employ decent scheduling and AQM to minimize the downsides of excessive queueing. E.g in https://openwrt.org/docs/guide-user/network/traffic-shaping/sqm-details we write: "Basic Settings - the details… SQM is designed to manage the queues of packets waiting to be sent across the slowest (bottleneck) link, which is usually your connection to the Internet. The algorithm cannot automatically adapt to network conditions on DSL, cable modems or GPON without any settings. Since the majority of ISP provided configurations for buffering are broken today, you need take control of the bottleneck link away from the ISP and move it into the router so it can be fixed. You do this by entering link speeds that are a few percent below the actual speeds." *) Ad little as I understand them, I do not claim to be an expert on queueing theory. > Typically, it is recommended to use 95% of the upload/download speeds of that link as the limit. This forces packets to be dropped when the constraint is exceeded. Now this forces congestion control signals (dropped packets) to be observed by both ends. (In a cable DOCSIS system, this allows the edge to manage the throughput of the CMTS for the local endpoint, because the CMTS won't drop packets when it should - because configuring DOCSIS 3.1 CMTS's is often done in a way that causes bufferbloat in CMTS. DOCSIS 2 always had bufferbloat in the CMTS). > > Starlink doesn't sell you a stable "max rate" - instead that rate varies depending on traffic, and can't be easily measured. > So to configure the dishy or an edge router connected to it correctly, you need to enforce such a limit such that it actually causes FQ-codel to see dropped packets. > On Wednesday, June 8, 2022 3:12pm, "warren ponder" <wponder11@gmail.com> said: > > So this is really helpful. Is it fair to say then that end users with SQM and fq_codel on a Starlink connection should essentially not turn on SQM.and.just leave it off? > > On Wed, Jun 8, 2022, 11:47 AM David P. Reed <dpreed@deepplum.com> wrote: > I'm just going to remind folks that fixing bufferbloat in Starlink won't be possible with FQ-Codel in the CPE equipment. If that were possible, it could be fixed entirely in a box sitting between the dishy and the user's "home network". > > Evidence exists that the bulk of the "bloat" can exist, not just in the dishy, but also in the central "access point" where satellites in a coverage region direct all the traffic from and to the public Internet. This connection from the region becomes bloated if the inbound link and outbound link become "underprovisioned" for peak rates of all the served dishy terminals. > That public-Internet to StarLink access point (is there a more distinct, precise name) can develop a very long delay queue. For the same reason that bufferbloat always gets designed in - memory is cheap and plentiful, so instead of dropping packets to minimize latency, the link just stores packets until multiple seconds worth of traffic build up on one or both ends of that link. > > This problem can be solved only by dropping packets (with packet drop rate mitigated by ECN-marking) to match the desired round-trip latency across the entire Internet. Typically, this queue size should max out and start dropping packets at about 2 * cross-Internet desired latency * bit-rate of this link. > Cross-Internet desired latency can be selected these days by using light-speed in fiber between one side of the North American continent and the other - around 15 msec. is appropriate. (which should be the worst case end-to-end latency observed using Starlink, and is around the 20 msec. number bandied about by Musk - though he really never understood what end-to-end latency means). > > > Now it may be that the dishy itself also has such bloat built in, which would make FQ-Codel in the dishy also important. > > The problem called bufferbloat occurs whenever ANY router on ANY end-to-end shared path allows such queueing delay to accumulate before shortening the queue. > > It really frustrates me that memory keeps being added to router outbound buffers anywhere. And it may be that the reason is that almost nobody who designs packet forwarding systems understands Queueing Theory at all! It certainly doesn't help that "packet drops" (even one or two per second) are considered a failure of the equipment. > > FQ-codel is great, but why it works is that it makes the choice of what packet to drop far better (by being fair and a little bit elastic). However, the lack of FQ-Codel doesn't fix system-level bufferbloat. > > > _______________________________________________ > Starlink mailing list > Starlink@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/starlink > _______________________________________________ > Starlink mailing list > Starlink@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/starlink ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Starlink] FQ_Codel 2022-06-08 19:12 ` warren ponder 2022-06-08 20:49 ` David P. Reed @ 2022-06-09 0:12 ` Stuart Cheshire 2022-06-09 0:21 ` David Lang 1 sibling, 1 reply; 12+ messages in thread From: Stuart Cheshire @ 2022-06-09 0:12 UTC (permalink / raw) To: warren ponder; +Cc: David P. Reed, starlink On 8 Jun 2022, at 12:12, warren ponder <wponder11@gmail.com> wrote: > So this is really helpful. Is it fair to say then that end users with SQM and fq_codel on a Starlink connection should essentially not turn on SQM.and.just leave it off? My advice is that people should have SQM (e.g., fq_codel) enabled anywhere it is available. For devices that aren’t the bottleneck hop on a path it won’t make any difference, but it won’t hurt. And if the network topology is such that it does become the bottleneck hop, even briefly, SQM will avoid having a big queue build up there. One example is Wi-Fi. If you have 50Mb/s Internet service and 802.11ac Wi-Fi in the house, your Wi-Fi is unlikely to be the bottleneck. But if you walk out to the garden and the Wi-Fi rate drops to 40Mb/s, then suddenly bufferbloat in the AP can bite you, leading to bi-modal network usability, that abruptly falls off a cliff the moment your Wi-Fi rate drops below your Internet service rate. I think this is a large part of the reason behind the enthusiasm these days for “mesh” Wi-Fi systems -- you need to blanket your home with sufficient density of Wi-Fi access points to ensure that they never become the bottleneck hop and expose their incompetent queue management. If you get 11Mb/s in the garden that should be plenty to stream music, but throw in some egregious bufferbloat and a perfectly good 11Mb/s rate becomes unusably bad. Ironically, if you pay more for faster Internet service then the problem gets worse, not better, because the effective usable range of your bufferbloated Wi-Fi access points shrinks as the rate coming into the house goes up. Stuart Cheshire ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Starlink] FQ_Codel 2022-06-09 0:12 ` Stuart Cheshire @ 2022-06-09 0:21 ` David Lang 2022-06-09 1:11 ` Dave Taht 0 siblings, 1 reply; 12+ messages in thread From: David Lang @ 2022-06-09 0:21 UTC (permalink / raw) To: Stuart Cheshire; +Cc: warren ponder, starlink, David P. Reed [-- Attachment #1: Type: text/plain, Size: 2623 bytes --] multiple access points, good. Mesh can make the problem worse. The combination of hidden transmitters (station in the middle can hear stations on both ends, but they can't hear each other and so step on each other) and just more airtime needed ro relay the messages as there are more hops can make the congestion worse (however, it is possible that higher data rates could make the transmissions shorter, but since the inter-aggregate gaps and per-aggregate headers are fixed at a low data rate, I doubt that it works that way in practice) but get a few additional APs hooked together via wires, and you have a clear win that scales very well. It's what we do at the Scale conf with 100+ APs to support 3k+ geeks. David Lang On Wed, 8 Jun 2022, Stuart Cheshire wrote: > On 8 Jun 2022, at 12:12, warren ponder <wponder11@gmail.com> wrote: > >> So this is really helpful. Is it fair to say then that end users with SQM and fq_codel on a Starlink connection should essentially not turn on SQM.and.just leave it off? > > My advice is that people should have SQM (e.g., fq_codel) enabled anywhere it is available. For devices that aren’t the bottleneck hop on a path it won’t make any difference, but it won’t hurt. And if the network topology is such that it does become the bottleneck hop, even briefly, SQM will avoid having a big queue build up there. > > One example is Wi-Fi. If you have 50Mb/s Internet service and 802.11ac Wi-Fi in the house, your Wi-Fi is unlikely to be the bottleneck. But if you walk out to the garden and the Wi-Fi rate drops to 40Mb/s, then suddenly bufferbloat in the AP can bite you, leading to bi-modal network usability, that abruptly falls off a cliff the moment your Wi-Fi rate drops below your Internet service rate. I think this is a large part of the reason behind the enthusiasm these days for “mesh” Wi-Fi systems -- you need to blanket your home with sufficient density of Wi-Fi access points to ensure that they never become the bottleneck hop and expose their incompetent queue management. If you get 11Mb/s in the garden that should be plenty to stream music, but throw in some egregious bufferbloat and a perfectly good 11Mb/s rate becomes unusably bad. Ironically, if you pay more for faster Internet service then the problem gets worse, not better, because the effective usable range of your bufferbloate d Wi-Fi access points shrinks as the rate coming into the house goes up. > > Stuart Cheshire > > _______________________________________________ > Starlink mailing list > Starlink@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/starlink ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Starlink] FQ_Codel 2022-06-09 0:21 ` David Lang @ 2022-06-09 1:11 ` Dave Taht 2022-06-09 2:01 ` David Lang 0 siblings, 1 reply; 12+ messages in thread From: Dave Taht @ 2022-06-09 1:11 UTC (permalink / raw) To: David Lang; +Cc: Stuart Cheshire, starlink, David P. Reed On Wed, Jun 8, 2022 at 5:21 PM David Lang <david@lang.hm> wrote: > > multiple access points, good. Mesh can make the problem worse. > > The combination of hidden transmitters (station in the middle can hear stations > on both ends, but they can't hear each other and so step on each other) and just > more airtime needed ro relay the messages as there are more hops can make the > congestion worse (however, it is possible that higher data rates could make the > transmissions shorter, but since the inter-aggregate gaps and per-aggregate > headers are fixed at a low data rate, I doubt that it works that way in > practice) > > but get a few additional APs hooked together via wires, and you have a clear win > that scales very well. It's what we do at the Scale conf with 100+ APs to > support 3k+ geeks. Is there a physical scale conference this year? (It's in LA and a lot of space/film folk go there) For those that don't know, david lang has been putting together the fq_codeled APs there for what? 8 years now? Conference feedback on the wifi has generally been uniformly positive. What APs do you use now? > David Lang > > On Wed, 8 Jun 2022, Stuart Cheshire wrote: > > > On 8 Jun 2022, at 12:12, warren ponder <wponder11@gmail.com> wrote: > > > >> So this is really helpful. Is it fair to say then that end users with SQM and fq_codel on a Starlink connection should essentially not turn on SQM.and.just leave it off? > > > > My advice is that people should have SQM (e.g., fq_codel) enabled anywhere it is available. For devices that aren’t the bottleneck hop on a path it won’t make any difference, but it won’t hurt. And if the network topology is such that it does become the bottleneck hop, even briefly, SQM will avoid having a big queue build up there. > > > > One example is Wi-Fi. If you have 50Mb/s Internet service and 802.11ac Wi-Fi in the house, your Wi-Fi is unlikely to be the bottleneck. But if you walk out to the garden and the Wi-Fi rate drops to 40Mb/s, then suddenly bufferbloat in the AP can bite you, leading to bi-modal network usability, that abruptly falls off a cliff the moment your Wi-Fi rate drops below your Internet service rate. I think this is a large part of the reason behind the enthusiasm these days for “mesh” Wi-Fi systems -- you need to blanket your home with sufficient density of Wi-Fi access points to ensure that they never become the bottleneck hop and expose their incompetent queue management. If you get 11Mb/s in the garden that should be plenty to stream music, but throw in some egregious bufferbloat and a perfectly good 11Mb/s rate becomes unusably bad. Ironically, if you pay more for faster Internet service then the problem gets worse, not better, because the effective usable range of your bufferbloate > d Wi-Fi access points shrinks as the rate coming into the house goes up. > > > > Stuart Cheshire > > > > _______________________________________________ > > Starlink mailing list > > Starlink@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/starlink_______________________________________________ > Starlink mailing list > Starlink@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/starlink -- FQ World Domination pending: https://blog.cerowrt.org/post/state_of_fq_codel/ Dave Täht CEO, TekLibre, LLC ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Starlink] FQ_Codel 2022-06-09 1:11 ` Dave Taht @ 2022-06-09 2:01 ` David Lang 0 siblings, 0 replies; 12+ messages in thread From: David Lang @ 2022-06-09 2:01 UTC (permalink / raw) To: Dave Taht; +Cc: David Lang, Stuart Cheshire, starlink, David P. Reed [-- Attachment #1: Type: text/plain, Size: 4949 bytes --] On Wed, 8 Jun 2022, Dave Taht wrote: > On Wed, Jun 8, 2022 at 5:21 PM David Lang <david@lang.hm> wrote: >> >> multiple access points, good. Mesh can make the problem worse. >> >> The combination of hidden transmitters (station in the middle can hear stations >> on both ends, but they can't hear each other and so step on each other) and just >> more airtime needed ro relay the messages as there are more hops can make the >> congestion worse (however, it is possible that higher data rates could make the >> transmissions shorter, but since the inter-aggregate gaps and per-aggregate >> headers are fixed at a low data rate, I doubt that it works that way in >> practice) >> >> but get a few additional APs hooked together via wires, and you have a clear win >> that scales very well. It's what we do at the Scale conf with 100+ APs to >> support 3k+ geeks. > > Is there a physical scale conference this year? (It's in LA and a lot > of space/film folk go there) yes, it got pushed from the beginning of the year and will not be the last weekend of July > For those that don't know, david lang has been putting together the > fq_codeled APs there for what? 8 years now? Conference feedback on the > wifi has generally been uniformly positive. I've been doing the wifi since 2009 and am now co-chair for the network team. > What APs do you use now? We are back in the LAX hilton this year, our network is ~50 juniper 4200 switches, 100+ wndr2700/3800s and several miles of cable and a pair of fairly beefy servers to run the VMs to run and monitor the network. We've been talking the last 4-5 years of replacing the APs with something newer, but a combination of a turnover of most of our tech staff, covid, and the unreliability of the drivers on our first pick has kept us pushing it back year by year (when you are going to buy 100+ APs it adds up to quite a price tag, especially for an all-volunteer event) Over the last couple of years we've setup a couple of the 3800s hooked to a pi and a couple relays so every build gets auto-flashed and tested and just finished setting up a system that lets us put a hub/switch in place and flash the APs to the current version by the DHCP server detecting them come up on the network. This year we will be doing an updated version of what I documented here https://www.usenix.org/conference/lisa12/technical-sessions/presentation/lang_david_wireless but with an even higher density of APs on low power (the walls in this hotel are pretty good shielding and more people have 5G than did 10 years ago) If anyone is in the area, stop by and chat. We are always looking for volunteers to help setup and teardown as well ;-) (setup starts monday, and the guy who was going to be running monitoring just had to back out if someone wants to jump in the deep end) David Lang >> David Lang >> >> On Wed, 8 Jun 2022, Stuart Cheshire wrote: >> >>> On 8 Jun 2022, at 12:12, warren ponder <wponder11@gmail.com> wrote: >>> >>>> So this is really helpful. Is it fair to say then that end users with SQM and fq_codel on a Starlink connection should essentially not turn on SQM.and.just leave it off? >>> >>> My advice is that people should have SQM (e.g., fq_codel) enabled anywhere it is available. For devices that aren’t the bottleneck hop on a path it won’t make any difference, but it won’t hurt. And if the network topology is such that it does become the bottleneck hop, even briefly, SQM will avoid having a big queue build up there. >>> >>> One example is Wi-Fi. If you have 50Mb/s Internet service and 802.11ac Wi-Fi in the house, your Wi-Fi is unlikely to be the bottleneck. But if you walk out to the garden and the Wi-Fi rate drops to 40Mb/s, then suddenly bufferbloat in the AP can bite you, leading to bi-modal network usability, that abruptly falls off a cliff the moment your Wi-Fi rate drops below your Internet service rate. I think this is a large part of the reason behind the enthusiasm these days for “mesh” Wi-Fi systems -- you need to blanket your home with sufficient density of Wi-Fi access points to ensure that they never become the bottleneck hop and expose their incompetent queue management. If you get 11Mb/s in the garden that should be plenty to stream music, but throw in some egregious bufferbloat and a perfectly good 11Mb/s rate becomes unusably bad. Ironically, if you pay more for faster Internet service then the problem gets worse, not better, because the effective usable range of your bufferbloa te >> d Wi-Fi access points shrinks as the rate coming into the house goes up. >>> >>> Stuart Cheshire >>> >>> _______________________________________________ >>> Starlink mailing list >>> Starlink@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/starlink_______________________________________________ >> Starlink mailing list >> Starlink@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/starlink > > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Starlink] FQ_Codel 2022-06-08 18:47 ` [Starlink] FQ_Codel David P. Reed 2022-06-08 19:12 ` warren ponder @ 2022-06-09 8:50 ` Sebastian Moeller 1 sibling, 0 replies; 12+ messages in thread From: Sebastian Moeller @ 2022-06-09 8:50 UTC (permalink / raw) To: David P. Reed; +Cc: starlink Hi David, > On Jun 8, 2022, at 20:47, David P. Reed <dpreed@deepplum.com> wrote: > > I'm just going to remind folks that fixing bufferbloat in Starlink won't be possible with FQ-Codel in the CPE equipment. If that were possible, it could be fixed entirely in a box sitting between the dishy and the user's "home network". While we can not fix it, we can remedy it to some degree. > > Evidence exists that the bulk of the "bloat" can exist, not just in the dishy, but also in the central "access point" where satellites in a coverage region direct all the traffic from and to the public Internet. This connection from the region becomes bloated if the inbound link and outbound link become "underprovisioned" for peak rates of all the served dishy terminals. > That public-Internet to StarLink access point (is there a more distinct, precise name) can develop a very long delay queue. For the same reason that bufferbloat always gets designed in - memory is cheap and plentiful, ... but CPU cycles are still precious, it is this combination that results in the "over-sized, but under-managed" combination that is so atrocious for latency under load/working latency. (Granted, decent management typically means that the queues never really grow to fill large memory buffers, but that also means that reserving large amounts of memory for buffering does not hurt anymore). > so instead of dropping packets to minimize latency, the link just stores packets until multiple seconds worth of traffic build up on one or both ends of that link. As long as they service the remote stations in a somewhat predictable/fair rpund-robin fashion under load we should be able to remedy that though. > This problem can be solved only by dropping packets (with packet drop rate mitigated by ECN-marking) to match the desired round-trip latency across the entire Internet. Typically, this queue size should max out and start dropping packets at about 2 * cross-Internet desired latency * bit-rate of this link. > Cross-Internet desired latency can be selected these days by using light-speed in fiber between one side of the North American continent and the other - around 15 msec. is appropriate. (which should be the worst case end-to-end latency observed using Starlink, and is around the 20 msec. number bandied about by Musk - though he really never understood what end-to-end latency means). > > > Now it may be that the dishy itself also has such bloat built in, which would make FQ-Codel in the dishy also important. > > The problem called bufferbloat occurs whenever ANY router on ANY end-to-end shared path allows such queueing delay to accumulate before shortening the queue. > > It really frustrates me that memory keeps being added to router outbound buffers anywhere. And it may be that the reason is that almost nobody who designs packet forwarding systems understands Queueing Theory at all! It certainly doesn't help that "packet drops" (even one or two per second) are considered a failure of the equipment. > > FQ-codel is great, but why it works is that it makes the choice of what packet to drop far better (by being fair and a little bit elastic). However, the lack of FQ-Codel doesn't fix system-level bufferbloat. I would have guessed tat the FQ scheduler alone already helps a lot, as it restricts the pain from over-committing to the hash-bucket housing the offending flow. Sure selecting the most effective packet(s) to drop also helps, but FQ alone will already help non-capacity-seeking flows (that stay below their capacity share) a lot if competing with capacity seeking traffic on the same link. Regards Sebastian > > > _______________________________________________ > Starlink mailing list > Starlink@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/starlink ^ permalink raw reply [flat|nested] 12+ messages in thread
* [Starlink] FQ_Codel @ 2022-06-06 16:20 Warren Ponder 2022-06-08 16:47 ` Dave Taht 0 siblings, 1 reply; 12+ messages in thread From: Warren Ponder @ 2022-06-06 16:20 UTC (permalink / raw) To: starlink [-- Attachment #1: Type: text/plain, Size: 266 bytes --] I have been reading up on everything trying to get up to speed on fq_codel. I have Starlink and a router that implements fq_codel. I know implementations can vary. However, has anyone found any general strategies for CPE side settings that can make any improvement? [-- Attachment #2: Type: text/html, Size: 291 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Starlink] FQ_Codel 2022-06-06 16:20 Warren Ponder @ 2022-06-08 16:47 ` Dave Taht 0 siblings, 0 replies; 12+ messages in thread From: Dave Taht @ 2022-06-08 16:47 UTC (permalink / raw) To: Warren Ponder; +Cc: starlink [-- Attachment #1: Type: text/plain, Size: 3827 bytes --] On Wed, Jun 8, 2022 at 8:29 AM Warren Ponder <wponderlpp@gmail.com> wrote: > > I have been reading up on everything trying to get up to speed on fq_codel. I have Starlink and a router that implements fq_codel. I know implementations can vary. However, has anyone found any general strategies for CPE side settings that can make any improvement? My "strategy" has been to somehow convince 'em to burn a weekend with me on implementing sch_cake on the dishy. With BQL-like backpressure from the radio, making the dishy reliably do low latency videoconferencing or gaming is straightforward that way. Even SFQ would gain them this. Users would stop complaining so much when the bandwidth was low. Presently though, the dishy's userspace code seems to treat linux more like a bootloader - it reads from the radio and outputs to the ethernet port. They haven't done a GPL drop of that, just the router's - running a 10 year old (version 1), or 6 year old (version 2) hacked up, ancient, decrepit, vendor supported only version of openwrt "lede". In the olde days, a company entering a market like this would coddle developers, give them hardware and support, not Worse, on the wifi front, they chose a really scarce mediatek chip for that router... probably "locking it up"... and nobody in the openwrt effort (that I'm aware of) has been working on adding in the mainline support for it needed for any other company. We're making huge strides on mediatek in general getting all their other chipsets to behave like this: https://blog.cerowrt.org/post/fq-codel-unifi6/ with I hope, us finally fixing the tx power transmit bug that plagued many of the other mediatek wifi implementations: https://github.com/openwrt/mt76/issues/633 We've done enough reverse engineering on their devices for me to conclude that with statistics from the radio - and perhaps some signalling from the headend, that cake could compensate for bufferbloat in both directions (backpressure would be better still there, and on their headends), and there's some json stats that might be helpful in getting a downstream router to compensate also, but lacking a starlink node to hack on, I haven't got anywhere. Mike Puchol has been pulling the stats for his lovely graphing utilities... but I don't know if they are adaquate enough, without feeding 'em into cake. They could be doing so much better than the attached 300+ms of induced latency on the rrul test. They could be nearly totally flat latencywise across the board... if they'd give up on 100/20 and being RDOF compliant, and focus on just providing good low latency service at lower rates they could increase their subscriber density and nobody but the speedtest.net devote's would notice. It's been a frustrating year, not being able to get that weekend of mutual hacking out of starlink. 400k subscribers that could have taken advantage of all the innovations we've made in queuing delay in the last decade if they'd just sink that weekend into applying what they already almost have in their codebase. And we - in the bufferbloat effort, would have an exemplary implementation to point to. Anyway, elsewhere, we've been trying to get starlink users to test these means of active sensing and configuring cake on your own router. If you could give either of these scripts a shot? https://forum.openwrt.org/t/cake-w-adaptive-bandwidth/108848 Still, I retain hope that someone over there, will end up owning this problem, and fixing it. > _______________________________________________ > Starlink mailing list > Starlink@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/starlink -- FQ World Domination pending: https://blog.cerowrt.org/post/state_of_fq_codel/ Dave Täht CEO, TekLibre, LLC [-- Attachment #2: rrul_be_-_2022-05-29_14:17:48.svg --] [-- Type: image/svg+xml, Size: 155011 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-06-09 8:58 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <mailman.63.1654706837.1281.starlink@lists.bufferbloat.net> 2022-06-08 18:47 ` [Starlink] FQ_Codel David P. Reed 2022-06-08 19:12 ` warren ponder 2022-06-08 20:49 ` David P. Reed 2022-06-08 21:30 ` Dave Taht 2022-06-09 8:58 ` Sebastian Moeller 2022-06-09 0:12 ` Stuart Cheshire 2022-06-09 0:21 ` David Lang 2022-06-09 1:11 ` Dave Taht 2022-06-09 2:01 ` David Lang 2022-06-09 8:50 ` Sebastian Moeller 2022-06-06 16:20 Warren Ponder 2022-06-08 16:47 ` Dave Taht
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox