On Wed, Jul 21, 2021 at 4:20 PM Leonard Kleinrock <lk@cs.ucla.edu> wrote:

Just a few comments following David Reed's insightful comments re the history of the ARPANET and its approach to flow control. I have attached some pages from my Volume II which provide an understanding of how we addressed flow control and its implementation in the ARPANET.

The early days of the ARPANET design and evaluation involved detailed design of what we did call “Flow Control”. In my "Queueing Systems, Volume II: Computer Applications”, John Wiley, 1976, I documented much of what we designed and evaluated for the ARPANET, and focused on performance, deadlocks, lockups and degradations due to flow control design. Aspects of congestion control were considered, but this 2-volume book was mostly about understanding congestion. Of interest are the many deadlocks that we discovered in those early days as we evaluated and measured the network behavior. Flow control was designed into that early network, but it had a certain ad-hoc flavor and I point out the danger of requiring flows to depend upon the acquisition of multiple tokens that were allocated from different portions of the network at the same time in a distributed fashion. The attached relevant sections of the book address these issues; I thought it would be of value to see what we were looking at back then.

On a related topic regarding flow and congestion control (as triggered by David’s comment "at most one packet waiting for each egress link in the bottleneck path.”), in 1978, I published a paper in which I extended the notion of Power (the ratio of throughput to response time) that had been introduced by Giessler, et al and I pointed out the amazing properties that emerged when Power is optimized, e.g., that one should keep each hop in the pipe “just full”, i.e., one message per hop. As it turns out, and as has been discussed in this email chain, Jaffe showed in 1981 that this optimization was not decentralizable and so no one pursued this optimal operating point (notwithstanding the fact that I published other papers on this issue, for example in 1979 and in 1981). So this issue of Power lay dormant for decades until Van Jacobsen, et al, resurrected the idea with their BBR flow control design in 2016 when they showed that indeed one could decentralize power. Considerable research has since followed their paper including another by me in 2018. (This was not the first time that a publication challenging the merits of a new idea negatively impacted that idea for decades - for example, the 1988 book “Perceptrons” by Minsky and Papert discouraged research into neural networks for many years until that idea was proven to have merit.) But the story is not over as much work has yet to be done to develop the algorithms that can properly deal with congestion in the sense that this email chain continues to discuss it.

Best,
Len

On Jul 13, 2021, at 10:49 AM, David P. Reed <dpreed@deepplum.com> wrote:

Bob -

On Tuesday, July 13, 2021 1:07pm, "Bob McMahon" <bob.mcmahon@broadcom.com> said:

"Control at endpoints benefits greatly from even small amounts of
information supplied by the network about the degree of congestion present
on the path."

Agreed. The ECN mechanism seems like a shared thermostat in a building.
It's basically an on/off where everyone is trying to set the temperature.
It does affect, in a non-linear manner, but still an effect. Better than a
thermostat set at infinity or 0 Kelvin for sure.

I find the assumption that congestion occurs "in network" as not always
true. Taking OWD measurements with read side rate limiting suggests that
equally important to mitigating bufferbloat driven latency using congestion
signals is to make sure apps read "fast enough" whatever that means. I
rarely hear about how important it is for apps to prioritize reads over
open sockets. Not sure why that's overlooked and bufferbloat gets all the
attention. I'm probably missing something.

In the early days of the Internet protocol and also even ARPANET Host-Host protocol there were those who conflated host-level "flow control" (matching production rate of data into the network to the destination *process* consumption rate of data on a virtual circuit with a source capable of variable and unbounded bit rate) with "congestion control" in the network. The term "congestion control" wasn't even used in the Internetworking project when it was discussing design in the late 1970's. I tried to use it in our working group meetings, and every time I said "congestion" the response would be phrased as "flow".

The classic example was printing a file's contents from disk to an ASR33 terminal on an TIP (Terminal IMP). There was flow control in the end-to-end protocol to avoid overflowing the TTY's limited buffer. But those who grew up with ARPANET knew that thare was no way to accumulate queueing in the IMP network, because of RFNM's that required permission for each new packet to be sent. RFNM's implicitly prevented congestion from being caused by a virtual circuit. But a flow control problem remained, because at the higher level protocol, buffering would overflow at the TIP.

TCP adopted a different end-to-end *flow* control, so it solved the flow control problem by creating a Windowing mechanism. But it did not by itself solve the *congestion* control problem, even congestion built up inside the network by a wide-open window and a lazy operating system at the receiving end that just said, I've got a lot of virtual memory so I'll open the window to maximum size.

There was a lot of confusion, because the guys who came from the ARPANET environment, with all links being the same speed and RFNM limits on rate, couldn't see why the Internet stack was so collapse-prone. I think Multics, for example, as a giant virtual memory system caused congestion by opening up its window too much.

This is where Van Jacobson discovered that dropped packets were a "good enough" congestion signal because of "fate sharing" among the packets that flowed on a bottleneck path, and that windowing (invented for flow control by the receiver to protect itself from overflow if the receiver couldn't receive fast enough) could be used to slow down the sender to match the rate of senders to the capacity of the internal bottleneck link. An elegant "hack" that actually worked really well in practice.

Now we view it as a bug if the receiver opens its window too much, or otherwise doesn't translate dropped packets (or other incipient-congestion signals) to shut down the source transmission rate as quickly as possible. Fortunately, the proper state of the internet - the one it should seek as its ideal state - is that there is at most one packet waiting for each egress link in the bottleneck path. This stable state ensures that the window-reduction or slow-down signal encounters no congestion, with high probability. [Excursions from one-packet queue occur, but since only one-packet waiting is sufficient to fill the bottleneck link to capacity, they can't achieve higher throughput in steady state. In practice, noisy arrival distributions can reduce throughput, so allowing a small number of packets to be waiting on a bottleneck link's queue can slightly increase throughput. That's not asymptotically relevant, but as mentioned, the Internet is never near asymptotic behavior.]

Bob

On Tue, Jul 13, 2021 at 12:15 AM Amr Rizk <amr@rizk.com.de> wrote:

Ben,

it depends on what one tries to measure. Doing a rate scan using UDP (to
measure latency distributions under load) is the best thing that we have
but without actually knowing how resources are shared (fair share as in
WiFi, FIFO as nearly everywhere else) it becomes very difficult to
interpret the results or provide a proper argument on latency. You are
right - TCP stats are a proxy for user experience but I believe they are
difficult to reproduce (we are always talking about very short TCP flows -
the infinite TCP flow that converges to a steady behavior is purely
academic).

By the way, Little's law is a strong tool when it comes to averages. To be
able to say more (e.g. 1% of the delays is larger than x) one requires more
information (e.g. the traffic - On-OFF pattern) see [1]. I am not sure
when does such information readily exist.

Best
Amr

[1] https://dl.acm.org/doi/10.1145/3341617.3326146 or if behind a paywall
https://www.dcs.warwick.ac.uk/~florin/lib/sigmet19b.pdf

--------------------------------
Amr Rizk (amr.rizk@uni-due.de)
University of Duisburg-Essen

-----Ursprüngliche Nachricht-----
Von: Bloat <bloat-bounces@lists.bufferbloat.net> Im Auftrag von Ben Greear
Gesendet: Montag, 12. Juli 2021 22:32
An: Bob McMahon <bob.mcmahon@broadcom.com>
Cc: starlink@lists.bufferbloat.net; Make-Wifi-fast <
make-wifi-fast@lists.bufferbloat.net>; Leonard Kleinrock <lk@cs.ucla.edu>;
David P. Reed <dpreed@deepplum.com>; Cake List <cake@lists.bufferbloat.net>;
codel@lists.bufferbloat.net; cerowrt-devel <
cerowrt-devel@lists.bufferbloat.net>; bloat <bloat@lists.bufferbloat.net>
Betreff: Re: [Bloat] Little's Law mea culpa, but not invalidating my main
point

UDP is better for getting actual packet latency, for sure. TCP is
typical-user-experience-latency though, so it is also useful.

I'm interested in the test and visualization side of this. If there were
a way to give engineers a good real-time look at a complex real-world
network, then they have something to go on while trying to tune various
knobs in their network to improve it.

I'll let others try to figure out how build and tune the knobs, but the
data acquisition and visualization is something we might try to
accomplish. I have a feeling I'm not the first person to think of this,
however....probably someone already has done such a thing.

Thanks,
Ben

On 7/12/21 1:04 PM, Bob McMahon wrote:
I believe end host's TCP stats are insufficient as seen per the
"failed" congested control mechanisms over the last decades. I think
Jaffe pointed this out in
1979 though he was using what's been deemed on this thread as "spherical
cow queueing theory."

"Flow control in store-and-forward computer networks is appropriate
for decentralized execution. A formal description of a class of
"decentralized flow control algorithms" is given. The feasibility of
maximizing power with such algorithms is investigated. On the
assumption that communication links behave like M/M/1 servers it is
shown that no "decentralized flow control algorithm" can maximize network
power. Power has been suggested in the literature as a network performance
objective. It is also shown that no objective based only on the users'
throughputs and average delay is decentralizable. Finally, a restricted
class of algorithms cannot even approximate power."

https://ieeexplore.ieee.org/document/1095152

Did Jaffe make a mistake?

Also, it's been observed that latency is non-parametric in it's
distributions and computing gaussians per the central limit theorem
for OWD feedback loops aren't effective. How does one design a control
loop around things that are non-parametric? It also begs the question, what
are the feed forward knobs that can actually help?

Bob

On Mon, Jul 12, 2021 at 12:07 PM Ben Greear <greearb@candelatech.com
<mailto:greearb@candelatech.com>> wrote:

   Measuring one or a few links provides a bit of data, but seems like
if someone is trying to understand
   a large and real network, then the OWD between point A and B needs
to just be input into something much
   more grand. Assuming real-time OWD data exists between 100 to 1000
endpoint pairs, has anyone found a way
   to visualize this in a useful manner?

   Also, considering something better than ntp may not really scale to
1000+ endpoints, maybe round-trip
   time is only viable way to get this type of data. In that case,
maybe clever logic could use things
   like trace-route to get some idea of how long it takes to get 'onto'
the internet proper, and so estimate
   the last-mile latency. My assumption is that the last-mile latency
is where most of the pervasive
   assymetric network latencies would exist (or just ping 8.8.8.8 which
is 20ms from everywhere due to
   $magic).

   Endpoints could also triangulate a bit if needed, using some anchor
points in the network
   under test.

   Thanks,
   Ben

   On 7/12/21 11:21 AM, Bob McMahon wrote:
iperf 2 supports OWD and gives full histograms for TCP write to
read, TCP connect times, latency of packets (with UDP), latency of "frames"
with
simulated video traffic (TCP and UDP), xfer times of bursts with
low duty cycle traffic, and TCP RTT (sampling based.) It also has support
for sampling (per
interval reports) down to 100 usecs if configured with
--enable-fastsampling, otherwise the fastest sampling is 5 ms. We've
released all this as open source.

OWD only works if the end realtime clocks are synchronized using
a "machine level" protocol such as IEEE 1588 or PTP. Sadly, *most data
centers don't
   provide
sufficient level of clock accuracy and the GPS pulse per second *
to colo and vm customers.

https://iperf2.sourceforge.io/iperf-manpage.html

Bob

On Mon, Jul 12, 2021 at 10:40 AM David P. Reed <
dpreed@deepplum.com <mailto:dpreed@deepplum.com> <mailto:
dpreed@deepplum.com
   <mailto:dpreed@deepplum.com>>> wrote:

   On Monday, July 12, 2021 9:46am, "Livingood, Jason" <
Jason_Livingood@comcast.com <mailto:Jason_Livingood@comcast.com>
   <mailto:Jason_Livingood@comcast.com <mailto:
Jason_Livingood@comcast.com>>> said:

I think latency/delay is becoming seen to be as important
certainly, if not a more direct proxy for end user QoE. This is all still
evolving and I
   have
   to say is a super interesting & fun thing to work on. :-)

   If I could manage to sell one idea to the management
hierarchy of communications industry CEOs (operators, vendors, ...) it is
this one:

   "It's the end-to-end latency, stupid!"

   And I mean, by end-to-end, latency to complete a task at a
relevant layer of abstraction.

   At the link level, it's packet send to packet receive
completion.

   But at the transport level including retransmission buffers,
it's datagram (or message) origination until the acknowledgement arrives
for that
   message being
   delivered after whatever number of retransmissions, freeing
the retransmission buffer.

   At the WWW level, it's mouse click to display update
corresponding to completion of the request.

   What should be noted is that lower level latencies don't
directly predict the magnitude of higher-level latencies. But longer lower
level latencies
   almost
   always amplfify higher level latencies. Often non-linearly.

   Throughput is very, very weakly related to these latencies,
in contrast.

   The amplification process has to do with the presence of
queueing. Queueing is ALWAYS bad for latency, and throughput only helps if
it is in exactly the
   right place (the so-called input queue of the bottleneck
process, which is often a link, but not always).

   Can we get that slogan into Harvard Business Review? Can we
get it taught in Managerial Accounting at HBS? (which does address
logistics/supply chain
   queueing).

This electronic communication and the information and any files
transmitted with it, or attached to it, are confidential and are intended
solely for the
   use of
the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or
   otherwise
restricted from disclosure to anyone else. If you are not the
intended recipient or the person responsible for delivering the e-mail to
the intended
   recipient,
you are hereby notified that any use, copying, distributing,
dissemination, forwarding, printing, or copying of this e-mail is strictly
prohibited. If you
received this e-mail in error, please return the e-mail to the
sender, delete it from your computer, and destroy any printed copy of it.

   --
   Ben Greear <greearb@candelatech.com <mailto:greearb@candelatech.com

   Candela Technologies Inc http://www.candelatech.com

This electronic communication and the information and any files
transmitted with it, or attached to it, are confidential and are
intended solely for the use of the individual or entity to whom it is
addressed and may contain information that is confidential, legally
privileged, protected by privacy laws, or otherwise restricted from
disclosure to anyone else. If you are not the intended recipient or the
person responsible for delivering the e-mail to the intended recipient, you
are hereby notified that any use, copying, distributing, dissemination,
forwarding, printing, or copying of this e-mail is strictly prohibited. If
you received this e-mail in error, please return the e-mail to the sender,
delete it from your computer, and destroy any printed copy of it.

--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com

_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

--
This electronic communication and the information and any files transmitted
with it, or attached to it, are confidential and are intended solely for
the use of the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for delivering the
e-mail to the intended recipient, you are hereby notified that any use,
copying, distributing, dissemination, forwarding, printing, or copying of
this e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.