thanks David - I really like your clear distinction between avoidance and
optimized congestion.

v


On Tue, Sep 28, 2021 at 6:15 PM David P. Reed <dpreed@deepplum.com> wrote:

> Upon thinking about this, here's a radical idea:
>
>
>
> the expected time until a bottleneck link clears, that is, 0 packets are
> in the queue to be sent on it, must be < t, where t is an Internet-wide
> constant corresponding to the time it takes light to circle the earth.
>
>
>
> This is a local constraint, one that is required of a router. It can be
> achieved in any of a variety of ways (for example choosing to route
> different flows on different paths that don't include the bottleneck link).
>
>
>
> It need not be true at all times - but when I say "expected time", I mean
> that the queue's behavior is monitored so that this situation is quite rare
> over any interval of ten minutes or more.
>
>
>
> If a bottleneck link is continuously full for more than the time it takes
> for packets on a fiber (< light speed) to circle the earth, it is in REALLY
> bad shape. That must never happen.
>
>
>
> Why is this important?
>
>
>
> It's a matter of control theory - if the control loop delay gets longer
> than its minimum, instability tends to take over no matter what control
> discipline is used to manage the system.
>
>
>
> Now, it is important as hell to avoid bullshit research programs that try
> to "optimize" ustilization of link capacity at 100%. Those research
> programs focus on the absolute wrong measure - a proxy for "network capital
> cost" that is in fact the wrong measure of any real network operator's cost
> structure. The cost of media (wires, airtime, ...) is a tiny fraction of
> most network operations' cost in any real business or institution. We don't
> optimize highways by maximizing the number of cars on every stretch of
> highway, for obvious reasons, but also for non-obvious reasons.
>
>
>
> Latency and lack of flexibiilty or  reconfigurability impose real costs on
> a system that are far more significant to end-user value than the cost of
> the media.
>
>
>
> A sustained congestion of a bottleneck link is not a feature, but a very
> serious operational engineering error. People should be fired if they don't
> prevent that from ever happening, or allow it to persist.
>
>
>
> This is why telcos, for example, design networks to handle the expected
> maximum traffic with some excess apactity. This is why networks are
> constantly being upgraded as load increases, *before* overloads occur.
>
>
>
> It's an incredibly dangerous and arrogant assumption that operation in a
> congested mode is acceptable.
>
>
>
> That's the rationale for the "radical proposal".
>
>
>
> Sadly, academic thinkers (even ones who have worked in industry research
> labs on minor aspects) get drawn into solving the wrong problem -
> optimizing the case that should never happen.
>
>
>
> Sure that's helpful - but only in the same sense that when designing
> systems where accidents need to have fallbacks one needs to design the
> fallback system to work.
>
>
>
> Operating at fully congested state - or designing TCP to essencially come
> close to DDoS behavior on a bottleneck to get a publishable paper - is
> missing the point.
>
>
>
>
>
> On Monday, September 27, 2021 10:50am, "Bob Briscoe" <
> research@bobbriscoe.net> said:
>
> > Dave,
> >
> > On 26/09/2021 21:08, Dave Taht wrote:
> > > ... an exploration of smaller mss sizes in response to persistent
> congestion
> > >
> > > This is in response to two declarative statements in here that I've
> > > long disagreed with,
> > > involving NOT shrinking the mss, and not trying to do pacing...
> >
> > I would still avoid shrinking the MSS, 'cos you don't know if the
> > congestion constraint is the CPU, in which case you'll make congestion
> > worse. But we'll have to differ on that if you disagree.
> >
> > I don't think that paper said don't do pacing. In fact, it says "...pace
> > the segments at less than one per round trip..."
> >
> > Whatever, that paper was the problem statement, with just some ideas on
> > how we were going to solve it.
> > after that, Asad (added to the distro) did his whole Masters thesis on
> > this - I suggest you look at his thesis and code (pointers below).
> >
> > Also soon after he'd finished, changes to BBRv2 were introduced to
> > reduce queuing delay with large numbers of flows. You might want to take
> > a look at that too:
> >
> https://datatracker.ietf.org/meeting/106/materials/slides-106-iccrg-update-on-bbrv2#page=10
> >
> > >
> > > https://www.bobbriscoe.net/projects/latency/sub-mss-w.pdf
> > >
> > > OTherwise, for a change, I largely agree with bob.
> > >
> > > "No amount of AQM twiddling can fix this. The solution has to fix TCP."
> > >
> > > "nearly all TCP implementations cannot operate at less than two
> packets per
> > RTT"
> >
> > Back to Asad's Master's thesis, we found that just pacing out the
> > packets wasn't enough. There's a very brief summary of the 4 things we
> > found we had to do in 4 bullets in this section of our write-up for
> netdev:
> >
> https://bobbriscoe.net/projects/latency/tcp-prague-netdev0x13.pdf#subsubsection.3.1.6
> > And I've highlighted a couple of unexpected things that cropped up below.
> >
> > Asad's full thesis:
> >
> > Ahmed, A., "Extending TCP for Low Round Trip Delay",
> >
> > Masters Thesis, Uni Oslo , August 2019,
> >
> > <https://www.duo.uio.no/handle/10852/70966>.
> > Asad's thesis presentation:
> >     https://bobbriscoe.net/presents/1909submss/present_asadsa.pdf
> >
> > Code:
> >     https://bitbucket.org/asadsa/kernel420/src/submss/
> > Despite significant changes to basic TCP design principles, the diffs
> > were not that great.
> >
> > A number of tricky problems came up.
> >
> > * For instance, simple pacing when <1 ACK per RTT wasn't that simple.
> > Whenever there were bursts from cross-traffic, the consequent burst in
> > your own flow kept repeating in subsequent rounds. We realized this was
> > because you never have a real ACK clock (you always set the next send
> > time based on previous send times). So we set up the the next send time
> > but then re-adjusted it if/when the next ACK did actually arrive.
> >
> > * The additive increase of one segment was the other main problem. When
> > you have such a small window, multiplicative decrease scales fine, but
> > an additive increase of 1 segment is a huge jump in comparison, when
> > cwnd is a fraction of a segment. "Logarithmically scaled additive
> > increase" was our solution to that (basically, every time you set
> > ssthresh, alter the additive increase constant using a formula that
> > scales logarithmically with ssthresh, so it's still roughly 1 for the
> > current Internet scale).
> >
> > What became of Asad's work?
> > Altho the code finally worked pretty well {1}, we decided not to pursue
> > it further 'cos a minimum cwnd actually gives a trickle of throughput
> > protection against unresponsive flows (with the downside that it
> > increases queuing delay). That's not to say this isn't worth working on
> > further, but there was more to do to make it bullet proof, and we were
> > in two minds how important it was, so it worked its way down our
> > priority list.
> >
> > {Note 1: From memory, there was an outstanding problem with one flow
> > remaining dominant if you had step-ECN marking, which we worked out was
> > due to the logarithmically scaled additive increase, but we didn't work
> > on it further to fix it.}
> >
> >
> >
> > Bob
> >
> >
> > --
> > ________________________________________________________________
> > Bob Briscoe http://bobbriscoe.net/
> >
> > _______________________________________________
> > Ecn-sane mailing list
> > Ecn-sane@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/ecn-sane
> >
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane
>


-- 
Please send any postal/overnight deliveries to:
Vint Cerf
1435 Woodhurst Blvd
McLean, VA 22102
703-448-0965

until further notice