* [Ecn-sane] paper idea: praising smaller packets @ 2021-09-26 20:08 Dave Taht 2021-09-27 14:50 ` Bob Briscoe 0 siblings, 1 reply; 9+ messages in thread From: Dave Taht @ 2021-09-26 20:08 UTC (permalink / raw) To: Mohit P. Tahiliani; +Cc: ECN-Sane ... an exploration of smaller mss sizes in response to persistent congestion This is in response to two declarative statements in here that I've long disagreed with, involving NOT shrinking the mss, and not trying to do pacing... https://www.bobbriscoe.net/projects/latency/sub-mss-w.pdf OTherwise, for a change, I largely agree with bob. "No amount of AQM twiddling can fix this. The solution has to fix TCP." "nearly all TCP implementations cannot operate at less than two packets per RTT" -- Fixing Starlink's Latencies: https://www.youtube.com/watch?v=c9gLo6Xrwgw Dave Täht CEO, TekLibre, LLC ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Ecn-sane] paper idea: praising smaller packets 2021-09-26 20:08 [Ecn-sane] paper idea: praising smaller packets Dave Taht @ 2021-09-27 14:50 ` Bob Briscoe 2021-09-27 15:14 ` Dave Taht 2021-09-28 22:15 ` David P. Reed 0 siblings, 2 replies; 9+ messages in thread From: Bob Briscoe @ 2021-09-27 14:50 UTC (permalink / raw) To: Dave Taht, Mohit P. Tahiliani, Asad Sajjad Ahmed; +Cc: ECN-Sane Dave, On 26/09/2021 21:08, Dave Taht wrote: > ... an exploration of smaller mss sizes in response to persistent congestion > > This is in response to two declarative statements in here that I've > long disagreed with, > involving NOT shrinking the mss, and not trying to do pacing... I would still avoid shrinking the MSS, 'cos you don't know if the congestion constraint is the CPU, in which case you'll make congestion worse. But we'll have to differ on that if you disagree. I don't think that paper said don't do pacing. In fact, it says "...pace the segments at less than one per round trip..." Whatever, that paper was the problem statement, with just some ideas on how we were going to solve it. after that, Asad (added to the distro) did his whole Masters thesis on this - I suggest you look at his thesis and code (pointers below). Also soon after he'd finished, changes to BBRv2 were introduced to reduce queuing delay with large numbers of flows. You might want to take a look at that too: https://datatracker.ietf.org/meeting/106/materials/slides-106-iccrg-update-on-bbrv2#page=10 > > https://www.bobbriscoe.net/projects/latency/sub-mss-w.pdf > > OTherwise, for a change, I largely agree with bob. > > "No amount of AQM twiddling can fix this. The solution has to fix TCP." > > "nearly all TCP implementations cannot operate at less than two packets per RTT" Back to Asad's Master's thesis, we found that just pacing out the packets wasn't enough. There's a very brief summary of the 4 things we found we had to do in 4 bullets in this section of our write-up for netdev: https://bobbriscoe.net/projects/latency/tcp-prague-netdev0x13.pdf#subsubsection.3.1.6 And I've highlighted a couple of unexpected things that cropped up below. Asad's full thesis: Ahmed, A., "Extending TCP for Low Round Trip Delay", Masters Thesis, Uni Oslo , August 2019, <https://www.duo.uio.no/handle/10852/70966>. Asad's thesis presentation: https://bobbriscoe.net/presents/1909submss/present_asadsa.pdf Code: https://bitbucket.org/asadsa/kernel420/src/submss/ Despite significant changes to basic TCP design principles, the diffs were not that great. A number of tricky problems came up. * For instance, simple pacing when <1 ACK per RTT wasn't that simple. Whenever there were bursts from cross-traffic, the consequent burst in your own flow kept repeating in subsequent rounds. We realized this was because you never have a real ACK clock (you always set the next send time based on previous send times). So we set up the the next send time but then re-adjusted it if/when the next ACK did actually arrive. * The additive increase of one segment was the other main problem. When you have such a small window, multiplicative decrease scales fine, but an additive increase of 1 segment is a huge jump in comparison, when cwnd is a fraction of a segment. "Logarithmically scaled additive increase" was our solution to that (basically, every time you set ssthresh, alter the additive increase constant using a formula that scales logarithmically with ssthresh, so it's still roughly 1 for the current Internet scale). What became of Asad's work? Altho the code finally worked pretty well {1}, we decided not to pursue it further 'cos a minimum cwnd actually gives a trickle of throughput protection against unresponsive flows (with the downside that it increases queuing delay). That's not to say this isn't worth working on further, but there was more to do to make it bullet proof, and we were in two minds how important it was, so it worked its way down our priority list. {Note 1: From memory, there was an outstanding problem with one flow remaining dominant if you had step-ECN marking, which we worked out was due to the logarithmically scaled additive increase, but we didn't work on it further to fix it.} Bob -- ________________________________________________________________ Bob Briscoe http://bobbriscoe.net/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Ecn-sane] paper idea: praising smaller packets 2021-09-27 14:50 ` Bob Briscoe @ 2021-09-27 15:14 ` Dave Taht 2021-09-28 22:15 ` David P. Reed 1 sibling, 0 replies; 9+ messages in thread From: Dave Taht @ 2021-09-27 15:14 UTC (permalink / raw) To: Bob Briscoe; +Cc: Mohit P. Tahiliani, Asad Sajjad Ahmed, ECN-Sane I note that I have long clamped the mss at the router, on slower links to values below 600, I've never really got around to analyzing this, but in the presence of fixed length packet fifos elsewhere, a smaller ratio of acks to packets, and in search of better multiplexing for the types of traffic I care about most, it's made sense since I started doing it in the 90s. I care not one whit towards extra cpu usage elsewhere derived from these slow links. :) In striving for a better personal formulation as to how I view the internet, a statement I might start with today would be "the internet is a communications network" , as my priorities have always been voip, videoconferencing, gaming, request/response, and at the very bottom, capacity seeking bulk traffic. And, yes, when the patch went by to linux to count CEs better, just now, I got informed of the progress within bbrv2, which I'm pleased with and will attempt to test sometime in the coming months. A note towards flow queuing... vs fair queuing there is essentially no queueing if you manage to pace single packets below the service time it takes to deliver a round of all the other flows. If you deliver two packets back to back, above the quantum, you get that near 0 + the time it takes to serve a round, and I've often thought there migth be a way to take advantage of that property in non tcp transports, notably videoconferencing. Thank you for pointing me at these other works below. On Mon, Sep 27, 2021 at 7:50 AM Bob Briscoe <research@bobbriscoe.net> wrote: > > Dave, > > On 26/09/2021 21:08, Dave Taht wrote: > > ... an exploration of smaller mss sizes in response to persistent congestion > > > > This is in response to two declarative statements in here that I've > > long disagreed with, > > involving NOT shrinking the mss, and not trying to do pacing... > > I would still avoid shrinking the MSS, 'cos you don't know if the > congestion constraint is the CPU, in which case you'll make congestion > worse. But we'll have to differ on that if you disagree. > > I don't think that paper said don't do pacing. In fact, it says "...pace > the segments at less than one per round trip..." > > Whatever, that paper was the problem statement, with just some ideas on > how we were going to solve it. > after that, Asad (added to the distro) did his whole Masters thesis on > this - I suggest you look at his thesis and code (pointers below). > > Also soon after he'd finished, changes to BBRv2 were introduced to > reduce queuing delay with large numbers of flows. You might want to take > a look at that too: > https://datatracker.ietf.org/meeting/106/materials/slides-106-iccrg-update-on-bbrv2#page=10 > > > > > https://www.bobbriscoe.net/projects/latency/sub-mss-w.pdf > > > > OTherwise, for a change, I largely agree with bob. > > > > "No amount of AQM twiddling can fix this. The solution has to fix TCP." > > > > "nearly all TCP implementations cannot operate at less than two packets per RTT" > > Back to Asad's Master's thesis, we found that just pacing out the > packets wasn't enough. There's a very brief summary of the 4 things we > found we had to do in 4 bullets in this section of our write-up for netdev: > https://bobbriscoe.net/projects/latency/tcp-prague-netdev0x13.pdf#subsubsection.3.1.6 > And I've highlighted a couple of unexpected things that cropped up below. > > Asad's full thesis: > Ahmed, A., "Extending TCP for Low Round Trip Delay", > Masters Thesis, Uni Oslo , August 2019, > <https://www.duo.uio.no/handle/10852/70966>. > Asad's thesis presentation: > https://bobbriscoe.net/presents/1909submss/present_asadsa.pdf > > Code: > https://bitbucket.org/asadsa/kernel420/src/submss/ > Despite significant changes to basic TCP design principles, the diffs > were not that great. > > A number of tricky problems came up. > > * For instance, simple pacing when <1 ACK per RTT wasn't that simple. > Whenever there were bursts from cross-traffic, the consequent burst in > your own flow kept repeating in subsequent rounds. We realized this was > because you never have a real ACK clock (you always set the next send > time based on previous send times). So we set up the the next send time > but then re-adjusted it if/when the next ACK did actually arrive. > > * The additive increase of one segment was the other main problem. When > you have such a small window, multiplicative decrease scales fine, but > an additive increase of 1 segment is a huge jump in comparison, when > cwnd is a fraction of a segment. "Logarithmically scaled additive > increase" was our solution to that (basically, every time you set > ssthresh, alter the additive increase constant using a formula that > scales logarithmically with ssthresh, so it's still roughly 1 for the > current Internet scale). > > What became of Asad's work? > Altho the code finally worked pretty well {1}, we decided not to pursue > it further 'cos a minimum cwnd actually gives a trickle of throughput > protection against unresponsive flows (with the downside that it > increases queuing delay). That's not to say this isn't worth working on > further, but there was more to do to make it bullet proof, and we were > in two minds how important it was, so it worked its way down our > priority list. > > {Note 1: From memory, there was an outstanding problem with one flow > remaining dominant if you had step-ECN marking, which we worked out was > due to the logarithmically scaled additive increase, but we didn't work > on it further to fix it.} > > > > Bob > > > -- > ________________________________________________________________ > Bob Briscoe http://bobbriscoe.net/ > -- Fixing Starlink's Latencies: https://www.youtube.com/watch?v=c9gLo6Xrwgw Dave Täht CEO, TekLibre, LLC ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Ecn-sane] paper idea: praising smaller packets 2021-09-27 14:50 ` Bob Briscoe 2021-09-27 15:14 ` Dave Taht @ 2021-09-28 22:15 ` David P. Reed 2021-09-29 9:26 ` Vint Cerf 2021-09-29 10:36 ` Jonathan Morton 1 sibling, 2 replies; 9+ messages in thread From: David P. Reed @ 2021-09-28 22:15 UTC (permalink / raw) To: Bob Briscoe; +Cc: Dave Taht, Mohit P. Tahiliani, Asad Sajjad Ahmed, ECN-Sane [-- Attachment #1: Type: text/plain, Size: 7485 bytes --] Upon thinking about this, here's a radical idea: the expected time until a bottleneck link clears, that is, 0 packets are in the queue to be sent on it, must be < t, where t is an Internet-wide constant corresponding to the time it takes light to circle the earth. This is a local constraint, one that is required of a router. It can be achieved in any of a variety of ways (for example choosing to route different flows on different paths that don't include the bottleneck link). It need not be true at all times - but when I say "expected time", I mean that the queue's behavior is monitored so that this situation is quite rare over any interval of ten minutes or more. If a bottleneck link is continuously full for more than the time it takes for packets on a fiber (< light speed) to circle the earth, it is in REALLY bad shape. That must never happen. Why is this important? It's a matter of control theory - if the control loop delay gets longer than its minimum, instability tends to take over no matter what control discipline is used to manage the system. Now, it is important as hell to avoid bullshit research programs that try to "optimize" ustilization of link capacity at 100%. Those research programs focus on the absolute wrong measure - a proxy for "network capital cost" that is in fact the wrong measure of any real network operator's cost structure. The cost of media (wires, airtime, ...) is a tiny fraction of most network operations' cost in any real business or institution. We don't optimize highways by maximizing the number of cars on every stretch of highway, for obvious reasons, but also for non-obvious reasons. Latency and lack of flexibiilty or reconfigurability impose real costs on a system that are far more significant to end-user value than the cost of the media. A sustained congestion of a bottleneck link is not a feature, but a very serious operational engineering error. People should be fired if they don't prevent that from ever happening, or allow it to persist. This is why telcos, for example, design networks to handle the expected maximum traffic with some excess apactity. This is why networks are constantly being upgraded as load increases, *before* overloads occur. It's an incredibly dangerous and arrogant assumption that operation in a congested mode is acceptable. That's the rationale for the "radical proposal". Sadly, academic thinkers (even ones who have worked in industry research labs on minor aspects) get drawn into solving the wrong problem - optimizing the case that should never happen. Sure that's helpful - but only in the same sense that when designing systems where accidents need to have fallbacks one needs to design the fallback system to work. Operating at fully congested state - or designing TCP to essencially come close to DDoS behavior on a bottleneck to get a publishable paper - is missing the point. On Monday, September 27, 2021 10:50am, "Bob Briscoe" <research@bobbriscoe.net> said: > Dave, > > On 26/09/2021 21:08, Dave Taht wrote: > > ... an exploration of smaller mss sizes in response to persistent congestion > > > > This is in response to two declarative statements in here that I've > > long disagreed with, > > involving NOT shrinking the mss, and not trying to do pacing... > > I would still avoid shrinking the MSS, 'cos you don't know if the > congestion constraint is the CPU, in which case you'll make congestion > worse. But we'll have to differ on that if you disagree. > > I don't think that paper said don't do pacing. In fact, it says "...pace > the segments at less than one per round trip..." > > Whatever, that paper was the problem statement, with just some ideas on > how we were going to solve it. > after that, Asad (added to the distro) did his whole Masters thesis on > this - I suggest you look at his thesis and code (pointers below). > > Also soon after he'd finished, changes to BBRv2 were introduced to > reduce queuing delay with large numbers of flows. You might want to take > a look at that too: > https://datatracker.ietf.org/meeting/106/materials/slides-106-iccrg-update-on-bbrv2#page=10 > > > > > https://www.bobbriscoe.net/projects/latency/sub-mss-w.pdf > > > > OTherwise, for a change, I largely agree with bob. > > > > "No amount of AQM twiddling can fix this. The solution has to fix TCP." > > > > "nearly all TCP implementations cannot operate at less than two packets per > RTT" > > Back to Asad's Master's thesis, we found that just pacing out the > packets wasn't enough. There's a very brief summary of the 4 things we > found we had to do in 4 bullets in this section of our write-up for netdev: > https://bobbriscoe.net/projects/latency/tcp-prague-netdev0x13.pdf#subsubsection.3.1.6 > And I've highlighted a couple of unexpected things that cropped up below. > > Asad's full thesis: > > Ahmed, A., "Extending TCP for Low Round Trip Delay", > > Masters Thesis, Uni Oslo , August 2019, > > <https://www.duo.uio.no/handle/10852/70966>. > Asad's thesis presentation: > https://bobbriscoe.net/presents/1909submss/present_asadsa.pdf > > Code: > https://bitbucket.org/asadsa/kernel420/src/submss/ > Despite significant changes to basic TCP design principles, the diffs > were not that great. > > A number of tricky problems came up. > > * For instance, simple pacing when <1 ACK per RTT wasn't that simple. > Whenever there were bursts from cross-traffic, the consequent burst in > your own flow kept repeating in subsequent rounds. We realized this was > because you never have a real ACK clock (you always set the next send > time based on previous send times). So we set up the the next send time > but then re-adjusted it if/when the next ACK did actually arrive. > > * The additive increase of one segment was the other main problem. When > you have such a small window, multiplicative decrease scales fine, but > an additive increase of 1 segment is a huge jump in comparison, when > cwnd is a fraction of a segment. "Logarithmically scaled additive > increase" was our solution to that (basically, every time you set > ssthresh, alter the additive increase constant using a formula that > scales logarithmically with ssthresh, so it's still roughly 1 for the > current Internet scale). > > What became of Asad's work? > Altho the code finally worked pretty well {1}, we decided not to pursue > it further 'cos a minimum cwnd actually gives a trickle of throughput > protection against unresponsive flows (with the downside that it > increases queuing delay). That's not to say this isn't worth working on > further, but there was more to do to make it bullet proof, and we were > in two minds how important it was, so it worked its way down our > priority list. > > {Note 1: From memory, there was an outstanding problem with one flow > remaining dominant if you had step-ECN marking, which we worked out was > due to the logarithmically scaled additive increase, but we didn't work > on it further to fix it.} > > > > Bob > > > -- > ________________________________________________________________ > Bob Briscoe http://bobbriscoe.net/ > > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane > [-- Attachment #2: Type: text/html, Size: 12087 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Ecn-sane] paper idea: praising smaller packets 2021-09-28 22:15 ` David P. Reed @ 2021-09-29 9:26 ` Vint Cerf 2021-09-29 10:36 ` Jonathan Morton 1 sibling, 0 replies; 9+ messages in thread From: Vint Cerf @ 2021-09-29 9:26 UTC (permalink / raw) To: David P. Reed Cc: Bob Briscoe, Mohit P. Tahiliani, ECN-Sane, Asad Sajjad Ahmed [-- Attachment #1: Type: text/plain, Size: 8132 bytes --] thanks David - I really like your clear distinction between avoidance and optimized congestion. v On Tue, Sep 28, 2021 at 6:15 PM David P. Reed <dpreed@deepplum.com> wrote: > Upon thinking about this, here's a radical idea: > > > > the expected time until a bottleneck link clears, that is, 0 packets are > in the queue to be sent on it, must be < t, where t is an Internet-wide > constant corresponding to the time it takes light to circle the earth. > > > > This is a local constraint, one that is required of a router. It can be > achieved in any of a variety of ways (for example choosing to route > different flows on different paths that don't include the bottleneck link). > > > > It need not be true at all times - but when I say "expected time", I mean > that the queue's behavior is monitored so that this situation is quite rare > over any interval of ten minutes or more. > > > > If a bottleneck link is continuously full for more than the time it takes > for packets on a fiber (< light speed) to circle the earth, it is in REALLY > bad shape. That must never happen. > > > > Why is this important? > > > > It's a matter of control theory - if the control loop delay gets longer > than its minimum, instability tends to take over no matter what control > discipline is used to manage the system. > > > > Now, it is important as hell to avoid bullshit research programs that try > to "optimize" ustilization of link capacity at 100%. Those research > programs focus on the absolute wrong measure - a proxy for "network capital > cost" that is in fact the wrong measure of any real network operator's cost > structure. The cost of media (wires, airtime, ...) is a tiny fraction of > most network operations' cost in any real business or institution. We don't > optimize highways by maximizing the number of cars on every stretch of > highway, for obvious reasons, but also for non-obvious reasons. > > > > Latency and lack of flexibiilty or reconfigurability impose real costs on > a system that are far more significant to end-user value than the cost of > the media. > > > > A sustained congestion of a bottleneck link is not a feature, but a very > serious operational engineering error. People should be fired if they don't > prevent that from ever happening, or allow it to persist. > > > > This is why telcos, for example, design networks to handle the expected > maximum traffic with some excess apactity. This is why networks are > constantly being upgraded as load increases, *before* overloads occur. > > > > It's an incredibly dangerous and arrogant assumption that operation in a > congested mode is acceptable. > > > > That's the rationale for the "radical proposal". > > > > Sadly, academic thinkers (even ones who have worked in industry research > labs on minor aspects) get drawn into solving the wrong problem - > optimizing the case that should never happen. > > > > Sure that's helpful - but only in the same sense that when designing > systems where accidents need to have fallbacks one needs to design the > fallback system to work. > > > > Operating at fully congested state - or designing TCP to essencially come > close to DDoS behavior on a bottleneck to get a publishable paper - is > missing the point. > > > > > > On Monday, September 27, 2021 10:50am, "Bob Briscoe" < > research@bobbriscoe.net> said: > > > Dave, > > > > On 26/09/2021 21:08, Dave Taht wrote: > > > ... an exploration of smaller mss sizes in response to persistent > congestion > > > > > > This is in response to two declarative statements in here that I've > > > long disagreed with, > > > involving NOT shrinking the mss, and not trying to do pacing... > > > > I would still avoid shrinking the MSS, 'cos you don't know if the > > congestion constraint is the CPU, in which case you'll make congestion > > worse. But we'll have to differ on that if you disagree. > > > > I don't think that paper said don't do pacing. In fact, it says "...pace > > the segments at less than one per round trip..." > > > > Whatever, that paper was the problem statement, with just some ideas on > > how we were going to solve it. > > after that, Asad (added to the distro) did his whole Masters thesis on > > this - I suggest you look at his thesis and code (pointers below). > > > > Also soon after he'd finished, changes to BBRv2 were introduced to > > reduce queuing delay with large numbers of flows. You might want to take > > a look at that too: > > > https://datatracker.ietf.org/meeting/106/materials/slides-106-iccrg-update-on-bbrv2#page=10 > > > > > > > > https://www.bobbriscoe.net/projects/latency/sub-mss-w.pdf > > > > > > OTherwise, for a change, I largely agree with bob. > > > > > > "No amount of AQM twiddling can fix this. The solution has to fix TCP." > > > > > > "nearly all TCP implementations cannot operate at less than two > packets per > > RTT" > > > > Back to Asad's Master's thesis, we found that just pacing out the > > packets wasn't enough. There's a very brief summary of the 4 things we > > found we had to do in 4 bullets in this section of our write-up for > netdev: > > > https://bobbriscoe.net/projects/latency/tcp-prague-netdev0x13.pdf#subsubsection.3.1.6 > > And I've highlighted a couple of unexpected things that cropped up below. > > > > Asad's full thesis: > > > > Ahmed, A., "Extending TCP for Low Round Trip Delay", > > > > Masters Thesis, Uni Oslo , August 2019, > > > > <https://www.duo.uio.no/handle/10852/70966>. > > Asad's thesis presentation: > > https://bobbriscoe.net/presents/1909submss/present_asadsa.pdf > > > > Code: > > https://bitbucket.org/asadsa/kernel420/src/submss/ > > Despite significant changes to basic TCP design principles, the diffs > > were not that great. > > > > A number of tricky problems came up. > > > > * For instance, simple pacing when <1 ACK per RTT wasn't that simple. > > Whenever there were bursts from cross-traffic, the consequent burst in > > your own flow kept repeating in subsequent rounds. We realized this was > > because you never have a real ACK clock (you always set the next send > > time based on previous send times). So we set up the the next send time > > but then re-adjusted it if/when the next ACK did actually arrive. > > > > * The additive increase of one segment was the other main problem. When > > you have such a small window, multiplicative decrease scales fine, but > > an additive increase of 1 segment is a huge jump in comparison, when > > cwnd is a fraction of a segment. "Logarithmically scaled additive > > increase" was our solution to that (basically, every time you set > > ssthresh, alter the additive increase constant using a formula that > > scales logarithmically with ssthresh, so it's still roughly 1 for the > > current Internet scale). > > > > What became of Asad's work? > > Altho the code finally worked pretty well {1}, we decided not to pursue > > it further 'cos a minimum cwnd actually gives a trickle of throughput > > protection against unresponsive flows (with the downside that it > > increases queuing delay). That's not to say this isn't worth working on > > further, but there was more to do to make it bullet proof, and we were > > in two minds how important it was, so it worked its way down our > > priority list. > > > > {Note 1: From memory, there was an outstanding problem with one flow > > remaining dominant if you had step-ECN marking, which we worked out was > > due to the logarithmically scaled additive increase, but we didn't work > > on it further to fix it.} > > > > > > > > Bob > > > > > > -- > > ________________________________________________________________ > > Bob Briscoe http://bobbriscoe.net/ > > > > _______________________________________________ > > Ecn-sane mailing list > > Ecn-sane@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/ecn-sane > > > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane > -- Please send any postal/overnight deliveries to: Vint Cerf 1435 Woodhurst Blvd McLean, VA 22102 703-448-0965 until further notice [-- Attachment #2: Type: text/html, Size: 12954 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Ecn-sane] paper idea: praising smaller packets 2021-09-28 22:15 ` David P. Reed 2021-09-29 9:26 ` Vint Cerf @ 2021-09-29 10:36 ` Jonathan Morton 2021-09-29 10:55 ` Vint Cerf 2021-09-29 19:34 ` David P. Reed 1 sibling, 2 replies; 9+ messages in thread From: Jonathan Morton @ 2021-09-29 10:36 UTC (permalink / raw) To: David P. Reed Cc: Bob Briscoe, Mohit P. Tahiliani, ECN-Sane, Asad Sajjad Ahmed > On 29 Sep, 2021, at 1:15 am, David P. Reed <dpreed@deepplum.com> wrote: > > Now, it is important as hell to avoid bullshit research programs that try to "optimize" ustilization of link capacity at 100%. Those research programs focus on the absolute wrong measure - a proxy for "network capital cost" that is in fact the wrong measure of any real network operator's cost structure. The cost of media (wires, airtime, ...) is a tiny fraction of most network operations' cost in any real business or institution. We don't optimize highways by maximizing the number of cars on every stretch of highway, for obvious reasons, but also for non-obvious reasons. I think it is important to distinguish between core/access networks and last-mile links. The technical distinction is in the level of statistical multiplexing - high in the former, low in the latter. The cost structure to the relevant user is also significantly different. I agree with your analysis when it comes to core/access networks with a high degree of statistical multiplexing. These networks should be built with enough capacity to service their expected load. When the actual load exceeds installed capacity for whatever reason, keeping latency low maintains network stability and, with a reasonable AQM, should not result in appreciably reduced goodput in practice. The relevant user's costs are primarily in the hardware installed at each end of the link (hence minimising complexity in this high-speed hardware is often seen as an important goal), and possibly in the actual volume of traffic transferred, not in the raw capacity of the medium. All the same, if the medium were cheap, why not just install more of it, rather than spending big on the hardware at each end? There's probably a good explanation for this that I'm not quite aware of. Perhaps it has to do with capital versus operational costs. On a last-mile link, the relevant user is a member of the household that the link leads to. He is rather likely to be *very* interested in getting the most goodput out of the capacity available to him, on those occasions when he happens to have a heavy workload for it. He's just bought a game on Steam, for example, and wants to minimise the time spent waiting for multiple gigabytes to download before he can enjoy his purchase. Assuming his ISP and the Steam CDN have built their networks wisely, his last-mile link will be the bottleneck for this task - and optimising goodput over it becomes *more* important the lower the link capacity is. A lot of people, for one reason or another, still have links below 50Mbps, and sometimes *much* less than that. It's worth reminding the gigabit fibre crowd of that, once in a while. But he may not the only member of the household interested in this particular link. My landlord, for example, may commonly have his wife, sister, mother, and four children at home at any given time, depending on the time of year. Some of the things they wish to do may be latency-sensitive, and they are also likely to be annoyed if throughput-sensitive tasks are unreasonably impaired. So the goodput of the Steam download is not the only metric of relevance, taken holistically. And it is certainly not correct to maximise utilisation of the link, as you can "utilise" the link with a whole lot of useless junk, yet make no progress whatsoever. Maximising an overall measure of network power, however, probably *does* make sense - in both contexts. The method of doing so is naturally different in each context: 1: In core/access networks, ensuring that demand is always met by capacity maximises useful throughput and minimises latency. This is the natural optimum for network power. 2: It is reasonable to assume that installing more capacity has an associated cost, which may exert downward pressure on capacity. In core/access networks where demand exceeds capacity, throughput is fixed at capacity, and network power is maximised by minimising delays. This assumes that no individual traffic's throughput is unreasonably impaired, compared to others, in the process; the "linear product-based fairness index" can be used to detect this: https://en.wikipedia.org/wiki/Fairness_measure#:~:text=Product-based%20Fairness%20Indices 3: In a last-mile link, network power is maximised by maximising the goodput of useful applications, ensuring that all applications have a "fair" share of available capacity (for some reasonable definition of "fair"), and keeping latency as low as reasonably practical while doing so. This is likely to be associated with high link utilisation when demand is heavy. > Operating at fully congested state - or designing TCP to essentially come close to DDoS behaviour on a bottleneck to get a publishable paper - is missing the point. When writing a statement like that, it's probably important to indicate what a "fully congested state" actually means. Some might take it to mean merely 100% link utilisation, which could actually be part of an optimal network power solution. From context, I assume you actually mean that the queues are driven to maximum depth and to the point of overflow - or beyond. - Jonathan Morton ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Ecn-sane] paper idea: praising smaller packets 2021-09-29 10:36 ` Jonathan Morton @ 2021-09-29 10:55 ` Vint Cerf 2021-09-29 11:38 ` Jonathan Morton 2021-09-29 19:34 ` David P. Reed 1 sibling, 1 reply; 9+ messages in thread From: Vint Cerf @ 2021-09-29 10:55 UTC (permalink / raw) To: Jonathan Morton Cc: David P. Reed, Mohit P. Tahiliani, ECN-Sane, Asad Sajjad Ahmed [-- Attachment #1: Type: text/plain, Size: 5813 bytes --] fully congested: zero throughput, maximum (infinite) delay.... v On Wed, Sep 29, 2021 at 6:36 AM Jonathan Morton <chromatix99@gmail.com> wrote: > > On 29 Sep, 2021, at 1:15 am, David P. Reed <dpreed@deepplum.com> wrote: > > > > Now, it is important as hell to avoid bullshit research programs that > try to "optimize" ustilization of link capacity at 100%. Those research > programs focus on the absolute wrong measure - a proxy for "network capital > cost" that is in fact the wrong measure of any real network operator's cost > structure. The cost of media (wires, airtime, ...) is a tiny fraction of > most network operations' cost in any real business or institution. We don't > optimize highways by maximizing the number of cars on every stretch of > highway, for obvious reasons, but also for non-obvious reasons. > > I think it is important to distinguish between core/access networks and > last-mile links. The technical distinction is in the level of statistical > multiplexing - high in the former, low in the latter. The cost structure > to the relevant user is also significantly different. > > I agree with your analysis when it comes to core/access networks with a > high degree of statistical multiplexing. These networks should be built > with enough capacity to service their expected load. When the actual load > exceeds installed capacity for whatever reason, keeping latency low > maintains network stability and, with a reasonable AQM, should not result > in appreciably reduced goodput in practice. > > The relevant user's costs are primarily in the hardware installed at each > end of the link (hence minimising complexity in this high-speed hardware is > often seen as an important goal), and possibly in the actual volume of > traffic transferred, not in the raw capacity of the medium. All the same, > if the medium were cheap, why not just install more of it, rather than > spending big on the hardware at each end? There's probably a good > explanation for this that I'm not quite aware of. Perhaps it has to do > with capital versus operational costs. > > On a last-mile link, the relevant user is a member of the household that > the link leads to. He is rather likely to be *very* interested in getting > the most goodput out of the capacity available to him, on those occasions > when he happens to have a heavy workload for it. He's just bought a game > on Steam, for example, and wants to minimise the time spent waiting for > multiple gigabytes to download before he can enjoy his purchase. Assuming > his ISP and the Steam CDN have built their networks wisely, his last-mile > link will be the bottleneck for this task - and optimising goodput over it > becomes *more* important the lower the link capacity is. > > A lot of people, for one reason or another, still have links below 50Mbps, > and sometimes *much* less than that. It's worth reminding the gigabit > fibre crowd of that, once in a while. > > But he may not the only member of the household interested in this > particular link. My landlord, for example, may commonly have his wife, > sister, mother, and four children at home at any given time, depending on > the time of year. Some of the things they wish to do may be > latency-sensitive, and they are also likely to be annoyed if > throughput-sensitive tasks are unreasonably impaired. So the goodput of > the Steam download is not the only metric of relevance, taken > holistically. And it is certainly not correct to maximise utilisation of > the link, as you can "utilise" the link with a whole lot of useless junk, > yet make no progress whatsoever. > > Maximising an overall measure of network power, however, probably *does* > make sense - in both contexts. The method of doing so is naturally > different in each context: > > 1: In core/access networks, ensuring that demand is always met by capacity > maximises useful throughput and minimises latency. This is the natural > optimum for network power. > > 2: It is reasonable to assume that installing more capacity has an > associated cost, which may exert downward pressure on capacity. In > core/access networks where demand exceeds capacity, throughput is fixed at > capacity, and network power is maximised by minimising delays. This > assumes that no individual traffic's throughput is unreasonably impaired, > compared to others, in the process; the "linear product-based fairness > index" can be used to detect this: > > > https://en.wikipedia.org/wiki/Fairness_measure#:~:text=Product-based%20Fairness%20Indices > > 3: In a last-mile link, network power is maximised by maximising the > goodput of useful applications, ensuring that all applications have a > "fair" share of available capacity (for some reasonable definition of > "fair"), and keeping latency as low as reasonably practical while doing > so. This is likely to be associated with high link utilisation when demand > is heavy. > > > Operating at fully congested state - or designing TCP to essentially > come close to DDoS behaviour on a bottleneck to get a publishable paper - > is missing the point. > > When writing a statement like that, it's probably important to indicate > what a "fully congested state" actually means. Some might take it to mean > merely 100% link utilisation, which could actually be part of an optimal > network power solution. From context, I assume you actually mean that the > queues are driven to maximum depth and to the point of overflow - or beyond. > > - Jonathan Morton > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane > -- Please send any postal/overnight deliveries to: Vint Cerf 1435 Woodhurst Blvd McLean, VA 22102 703-448-0965 until further notice [-- Attachment #2: Type: text/html, Size: 6863 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Ecn-sane] paper idea: praising smaller packets 2021-09-29 10:55 ` Vint Cerf @ 2021-09-29 11:38 ` Jonathan Morton 0 siblings, 0 replies; 9+ messages in thread From: Jonathan Morton @ 2021-09-29 11:38 UTC (permalink / raw) To: Vint Cerf; +Cc: David P. Reed, Mohit P. Tahiliani, ECN-Sane, Asad Sajjad Ahmed > On 29 Sep, 2021, at 1:55 pm, Vint Cerf <vint@google.com> wrote: > > fully congested: zero throughput, maximum (infinite) delay.... …and hence minimum network power. Clearly an undesirable condition. - Jonathan Morton ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Ecn-sane] paper idea: praising smaller packets 2021-09-29 10:36 ` Jonathan Morton 2021-09-29 10:55 ` Vint Cerf @ 2021-09-29 19:34 ` David P. Reed 1 sibling, 0 replies; 9+ messages in thread From: David P. Reed @ 2021-09-29 19:34 UTC (permalink / raw) To: Jonathan Morton Cc: Bob Briscoe, Mohit P. Tahiliani, ECN-Sane, Asad Sajjad Ahmed [-- Attachment #1: Type: text/plain, Size: 8079 bytes --] Jonathan - I pretty much agree with most of what you say here. However, two things: 1) a router that has only one flow at a time traversing it is not a router. It's just a link that runs at memory speed in between two links. A degenerate case. 2) The start of my email - about the fact that each outbound link must be made to clear (with no queued traffic) within a copper or fiber speed circuit of the earth (great circle route) - is my criterion for NOT being 100% utilized. But it's a description that focuses on latency and capacity in a single measure. It's very close to 100% utilized. (this satisfies your concern about supporting low bit rates at the edges, but in a very different way). The problem with links is that they can NEVER be utilized more than 100%. So utilization is a TERRIBLE metric for thinking about the problem. I didn't mean this as a weird joke - I'm very serious. utilization is just the wrong measure. And so is end-to-end latency average - averages are not meaningful in a fat-tailed traffic distribution, no matter how you compute them, and average latency is a very strange characterization, since most paths actually have no traffic because each endpoint only uses a small subset of paths. Once upon a time I thought that all links should be capped at average utilization of 10% or 50%. But in fact that is a terrible measure too - averages are a bad metric, for the same reason. Instead, operationally it is OK for a link to be "almost full", as long as the control protocols create openings frequently enough to mitigate latency issues. (Side note: If you want to understand really deeply why "averages" are a terrible statistic for networking, I recommend reading Nassim Taleb's book about pre-asymptotic behavior of random systems and the problem of applying statistical measures to systems that are not "in equilibrium" - [ https://arxiv.org/abs/2001.10488 ]( https://arxiv.org/abs/2001.10488 ) . Seriously! It's tough sledding, sound math, and very enlightening. Much of what he says can be translated into the world of real networking and queueing. Sadly, most queueing theory doesn't touch on pre-asymptotic behavior, but instead assumes that the asymptotic behavior of a queueing system characterizes the normal behavior. ) (some people try to say that network traffic is "fractal", which is actually unreasonable - most protocols behave highly deterministically, and there's no "self-similarity" inherent in end-to-end flow statistics, no power laws, ...) On Wednesday, September 29, 2021 6:36am, "Jonathan Morton" <chromatix99@gmail.com> said: > > On 29 Sep, 2021, at 1:15 am, David P. Reed <dpreed@deepplum.com> > wrote: > > > > Now, it is important as hell to avoid bullshit research programs that try to > "optimize" ustilization of link capacity at 100%. Those research programs focus on > the absolute wrong measure - a proxy for "network capital cost" that is in fact > the wrong measure of any real network operator's cost structure. The cost of media > (wires, airtime, ...) is a tiny fraction of most network operations' cost in any > real business or institution. We don't optimize highways by maximizing the number > of cars on every stretch of highway, for obvious reasons, but also for non-obvious > reasons. > > I think it is important to distinguish between core/access networks and last-mile > links. The technical distinction is in the level of statistical multiplexing - > high in the former, low in the latter. The cost structure to the relevant user is > also significantly different. > > I agree with your analysis when it comes to core/access networks with a high > degree of statistical multiplexing. These networks should be built with enough > capacity to service their expected load. When the actual load exceeds installed > capacity for whatever reason, keeping latency low maintains network stability and, > with a reasonable AQM, should not result in appreciably reduced goodput in > practice. > > The relevant user's costs are primarily in the hardware installed at each end of > the link (hence minimising complexity in this high-speed hardware is often seen as > an important goal), and possibly in the actual volume of traffic transferred, not > in the raw capacity of the medium. All the same, if the medium were cheap, why > not just install more of it, rather than spending big on the hardware at each end? > There's probably a good explanation for this that I'm not quite aware of. > Perhaps it has to do with capital versus operational costs. > > On a last-mile link, the relevant user is a member of the household that the link > leads to. He is rather likely to be *very* interested in getting the most goodput > out of the capacity available to him, on those occasions when he happens to have a > heavy workload for it. He's just bought a game on Steam, for example, and wants > to minimise the time spent waiting for multiple gigabytes to download before he > can enjoy his purchase. Assuming his ISP and the Steam CDN have built their > networks wisely, his last-mile link will be the bottleneck for this task - and > optimising goodput over it becomes *more* important the lower the link capacity > is. > > A lot of people, for one reason or another, still have links below 50Mbps, and > sometimes *much* less than that. It's worth reminding the gigabit fibre crowd of > that, once in a while. > > But he may not the only member of the household interested in this particular > link. My landlord, for example, may commonly have his wife, sister, mother, and > four children at home at any given time, depending on the time of year. Some of > the things they wish to do may be latency-sensitive, and they are also likely to > be annoyed if throughput-sensitive tasks are unreasonably impaired. So the > goodput of the Steam download is not the only metric of relevance, taken > holistically. And it is certainly not correct to maximise utilisation of the > link, as you can "utilise" the link with a whole lot of useless junk, yet make no > progress whatsoever. > > Maximising an overall measure of network power, however, probably *does* make > sense - in both contexts. The method of doing so is naturally different in each > context: > > 1: In core/access networks, ensuring that demand is always met by capacity > maximises useful throughput and minimises latency. This is the natural optimum > for network power. > > 2: It is reasonable to assume that installing more capacity has an associated > cost, which may exert downward pressure on capacity. In core/access networks > where demand exceeds capacity, throughput is fixed at capacity, and network power > is maximised by minimising delays. This assumes that no individual traffic's > throughput is unreasonably impaired, compared to others, in the process; the > "linear product-based fairness index" can be used to detect this: > > https://en.wikipedia.org/wiki/Fairness_measure#:~:text=Product-based%20Fairness%20Indices > > 3: In a last-mile link, network power is maximised by maximising the goodput of > useful applications, ensuring that all applications have a "fair" share of > available capacity (for some reasonable definition of "fair"), and keeping latency > as low as reasonably practical while doing so. This is likely to be associated > with high link utilisation when demand is heavy. > > > Operating at fully congested state - or designing TCP to essentially come > close to DDoS behaviour on a bottleneck to get a publishable paper - is missing > the point. > > When writing a statement like that, it's probably important to indicate what a > "fully congested state" actually means. Some might take it to mean merely 100% > link utilisation, which could actually be part of an optimal network power > solution. From context, I assume you actually mean that the queues are driven to > maximum depth and to the point of overflow - or beyond. > > - Jonathan Morton [-- Attachment #2: Type: text/html, Size: 10939 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-09-29 19:34 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-09-26 20:08 [Ecn-sane] paper idea: praising smaller packets Dave Taht 2021-09-27 14:50 ` Bob Briscoe 2021-09-27 15:14 ` Dave Taht 2021-09-28 22:15 ` David P. Reed 2021-09-29 9:26 ` Vint Cerf 2021-09-29 10:36 ` Jonathan Morton 2021-09-29 10:55 ` Vint Cerf 2021-09-29 11:38 ` Jonathan Morton 2021-09-29 19:34 ` David P. Reed
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox