[Bloat] Little's Law mea culpa, but not invalidating my main point

Bob McMahon bob.mcmahon at broadcom.com
Wed Jul 14 14:37:58 EDT 2021


Thanks for this. I find it both interesting and useful. Learning from those
who came before me reminds me of "standing on the shoulders of giants." I
try to teach my kids that it's not so much us as the giants we choose - so
choose judiciously and, more importantly, be grateful when they provide
their shoulders from which to see.

One challenge I faced with iperf 2 was around flow control's effects on
latency. I find if iperf 2 rate limits on writes then the end/end
latencies, RTT look good because the pipe is basically empty, while rate
limiting reads to the same value fills the window and drives the RTT up.
One might conclude, from a network perspective, the write side is better.
But in reality, the write rate limiting is just pushing the delay into the
application's logic, i.e. the relevant bytes may not be in the pipe but
they aren't at the receiver either, they're stuck somewhere in the "tx
application space."

It wasn't obvious to me how to address this. We added burst measurements
(burst xfer time, and bursts/sec) which, I think, helps.

Bob

On Tue, Jul 13, 2021 at 10:49 AM David P. Reed <dpreed at deepplum.com> wrote:

> Bob -
>
> On Tuesday, July 13, 2021 1:07pm, "Bob McMahon" <bob.mcmahon at broadcom.com>
> said:
>
> > "Control at endpoints benefits greatly from even small amounts of
> > information supplied by the network about the degree of congestion
> present
> > on the path."
> >
> > Agreed. The ECN mechanism seems like a shared thermostat in a building.
> > It's basically an on/off where everyone is trying to set the temperature.
> > It does affect, in a non-linear manner, but still an effect. Better than
> a
> > thermostat set at infinity or 0 Kelvin for sure.
> >
> > I find the assumption that congestion occurs "in network" as not always
> > true. Taking OWD measurements with read side rate limiting suggests that
> > equally important to mitigating bufferbloat driven latency using
> congestion
> > signals is to make sure apps read "fast enough" whatever that means. I
> > rarely hear about how important it is for apps to prioritize reads over
> > open sockets. Not sure why that's overlooked and bufferbloat gets all the
> > attention. I'm probably missing something.
>
> In the early days of the Internet protocol and also even ARPANET Host-Host
> protocol there were those who conflated host-level "flow control" (matching
> production rate of data into the network to the destination *process*
> consumption rate of data on a virtual circuit with a source capable of
> variable and unbounded bit rate) with "congestion control" in the network.
> The term "congestion control" wasn't even used in the Internetworking
> project when it was discussing design in the late 1970's. I tried to use it
> in our working group meetings, and every time I said "congestion" the
> response would be phrased as "flow".
>
> The classic example was printing a file's contents from disk to an ASR33
> terminal on an TIP (Terminal IMP). There was flow control in the end-to-end
> protocol to avoid overflowing the TTY's limited buffer. But those who grew
> up with ARPANET knew that thare was no way to accumulate queueing in the
> IMP network, because of RFNM's that required permission for each new packet
> to be sent. RFNM's implicitly prevented congestion from being caused by a
> virtual circuit. But a flow control problem remained, because at the higher
> level protocol, buffering would overflow at the TIP.
>
> TCP adopted a different end-to-end *flow* control, so it solved the flow
> control problem by creating a Windowing mechanism. But it did not by itself
> solve the *congestion* control problem, even congestion built up inside the
> network by a wide-open window and a lazy operating system at the receiving
> end that just said, I've got a lot of virtual memory so I'll open the
> window to maximum size.
>
> There was a lot of confusion, because the guys who came from the ARPANET
> environment, with all links being the same speed and RFNM limits on rate,
> couldn't see why the Internet stack was so collapse-prone. I think Multics,
> for example, as a giant virtual memory system caused congestion by opening
> up its window too much.
>
> This is where Van Jacobson discovered that dropped packets were a "good
> enough" congestion signal because of "fate sharing" among the packets that
> flowed on a bottleneck path, and that windowing (invented for flow control
> by the receiver to protect itself from overflow if the receiver couldn't
> receive fast enough) could be used to slow down the sender to match the
> rate of senders to the capacity of the internal bottleneck link. An elegant
> "hack" that actually worked really well in practice.
>
> Now we view it as a bug if the receiver opens its window too much, or
> otherwise doesn't translate dropped packets (or other incipient-congestion
> signals) to shut down the source transmission rate as quickly as possible.
> Fortunately, the proper state of the internet - the one it should seek as
> its ideal state - is that there is at most one packet waiting for each
> egress link in the bottleneck path. This stable state ensures that the
> window-reduction or slow-down signal encounters no congestion, with high
> probability. [Excursions from one-packet queue occur, but since only
> one-packet waiting is sufficient to fill the bottleneck link to capacity,
> they can't achieve higher throughput in steady state. In practice, noisy
> arrival distributions can reduce throughput, so allowing a small number of
> packets to be waiting on a bottleneck link's queue can slightly increase
> throughput. That's not asymptotically relevant, but as mentioned, the
> Internet is never near asymptotic behavior.]
>
>
> >
> > Bob
> >
> > On Tue, Jul 13, 2021 at 12:15 AM Amr Rizk <amr at rizk.com.de> wrote:
> >
> >> Ben,
> >>
> >> it depends on what one tries to measure. Doing a rate scan using UDP (to
> >> measure latency distributions under load) is the best thing that we have
> >> but without actually knowing how resources are shared (fair share as in
> >> WiFi, FIFO as nearly everywhere else) it becomes very difficult to
> >> interpret the results or provide a proper argument on latency. You are
> >> right - TCP stats are a proxy for user experience but I believe they are
> >> difficult to reproduce (we are always talking about very short TCP
> flows -
> >> the infinite TCP flow that converges to a steady behavior is purely
> >> academic).
> >>
> >> By the way, Little's law is a strong tool when it comes to averages. To
> be
> >> able to say more (e.g. 1% of the delays is larger than x) one requires
> more
> >> information (e.g. the traffic - On-OFF pattern) see [1].  I am not sure
> >> when does such information readily exist.
> >>
> >> Best
> >> Amr
> >>
> >> [1] https://dl.acm.org/doi/10.1145/3341617.3326146 or if behind a
> paywall
> >> https://www.dcs.warwick.ac.uk/~florin/lib/sigmet19b.pdf
> >>
> >> --------------------------------
> >> Amr Rizk (amr.rizk at uni-due.de)
> >> University of Duisburg-Essen
> >>
> >> -----Urspr√ľngliche Nachricht-----
> >> Von: Bloat <bloat-bounces at lists.bufferbloat.net> Im Auftrag von Ben
> Greear
> >> Gesendet: Montag, 12. Juli 2021 22:32
> >> An: Bob McMahon <bob.mcmahon at broadcom.com>
> >> Cc: starlink at lists.bufferbloat.net; Make-Wifi-fast <
> >> make-wifi-fast at lists.bufferbloat.net>; Leonard Kleinrock <
> lk at cs.ucla.edu>;
> >> David P. Reed <dpreed at deepplum.com>; Cake List <
> cake at lists.bufferbloat.net>;
> >> codel at lists.bufferbloat.net; cerowrt-devel <
> >> cerowrt-devel at lists.bufferbloat.net>; bloat <
> bloat at lists.bufferbloat.net>
> >> Betreff: Re: [Bloat] Little's Law mea culpa, but not invalidating my
> main
> >> point
> >>
> >> UDP is better for getting actual packet latency, for sure.  TCP is
> >> typical-user-experience-latency though, so it is also useful.
> >>
> >> I'm interested in the test and visualization side of this.  If there
> were
> >> a way to give engineers a good real-time look at a complex real-world
> >> network, then they have something to go on while trying to tune various
> >> knobs in their network to improve it.
> >>
> >> I'll let others try to figure out how build and tune the knobs, but the
> >> data acquisition and visualization is something we might try to
> >> accomplish.  I have a feeling I'm not the first person to think of this,
> >> however....probably someone already has done such a thing.
> >>
> >> Thanks,
> >> Ben
> >>
> >> On 7/12/21 1:04 PM, Bob McMahon wrote:
> >> > I believe end host's TCP stats are insufficient as seen per the
> >> > "failed" congested control mechanisms over the last decades. I think
> >> > Jaffe pointed this out in
> >> > 1979 though he was using what's been deemed on this thread as
> "spherical
> >> cow queueing theory."
> >> >
> >> > "Flow control in store-and-forward computer networks is appropriate
> >> > for decentralized execution. A formal description of a class of
> >> > "decentralized flow control algorithms" is given. The feasibility of
> >> > maximizing power with such algorithms is investigated. On the
> >> > assumption that communication links behave like M/M/1 servers it is
> >> shown that no "decentralized flow control algorithm" can maximize
> network
> >> power. Power has been suggested in the literature as a network
> performance
> >> objective. It is also shown that no objective based only on the users'
> >> throughputs and average delay is decentralizable. Finally, a restricted
> >> class of algorithms cannot even approximate power."
> >> >
> >> > https://ieeexplore.ieee.org/document/1095152
> >> >
> >> > Did Jaffe make a mistake?
> >> >
> >> > Also, it's been observed that latency is non-parametric in it's
> >> > distributions and computing gaussians per the central limit theorem
> >> > for OWD feedback loops aren't effective. How does one design a control
> >> loop around things that are non-parametric? It also begs the question,
> what
> >> are the feed forward knobs that can actually help?
> >> >
> >> > Bob
> >> >
> >> > On Mon, Jul 12, 2021 at 12:07 PM Ben Greear <greearb at candelatech.com
> >> <mailto:greearb at candelatech.com>> wrote:
> >> >
> >> >     Measuring one or a few links provides a bit of data, but seems
> like
> >> if someone is trying to understand
> >> >     a large and real network, then the OWD between point A and B needs
> >> to just be input into something much
> >> >     more grand.  Assuming real-time OWD data exists between 100 to
> 1000
> >> endpoint pairs, has anyone found a way
> >> >     to visualize this in a useful manner?
> >> >
> >> >     Also, considering something better than ntp may not really scale
> to
> >> 1000+ endpoints, maybe round-trip
> >> >     time is only viable way to get this type of data.  In that case,
> >> maybe clever logic could use things
> >> >     like trace-route to get some idea of how long it takes to get
> 'onto'
> >> the internet proper, and so estimate
> >> >     the last-mile latency.  My assumption is that the last-mile
> latency
> >> is where most of the pervasive
> >> >     assymetric network latencies would exist (or just ping 8.8.8.8
> which
> >> is 20ms from everywhere due to
> >> >     $magic).
> >> >
> >> >     Endpoints could also triangulate a bit if needed, using some
> anchor
> >> points in the network
> >> >     under test.
> >> >
> >> >     Thanks,
> >> >     Ben
> >> >
> >> >     On 7/12/21 11:21 AM, Bob McMahon wrote:
> >> >      > iperf 2 supports OWD and gives full histograms for TCP write to
> >> read, TCP connect times, latency of packets (with UDP), latency of
> "frames"
> >> with
> >> >      > simulated video traffic (TCP and UDP), xfer times of bursts
> with
> >> low duty cycle traffic, and TCP RTT (sampling based.) It also has
> support
> >> for sampling (per
> >> >      > interval reports) down to 100 usecs if configured with
> >> --enable-fastsampling, otherwise the fastest sampling is 5 ms. We've
> >> released all this as open source.
> >> >      >
> >> >      > OWD only works if the end realtime clocks are synchronized
> using
> >> a "machine level" protocol such as IEEE 1588 or PTP. Sadly, *most data
> >> centers don't
> >> >     provide
> >> >      > sufficient level of clock accuracy and the GPS pulse per
> second *
> >> to colo and vm customers.
> >> >      >
> >> >      > https://iperf2.sourceforge.io/iperf-manpage.html
> >> >      >
> >> >      > Bob
> >> >      >
> >> >      > On Mon, Jul 12, 2021 at 10:40 AM David P. Reed <
> >> dpreed at deepplum.com <mailto:dpreed at deepplum.com> <mailto:
> >> dpreed at deepplum.com
> >> >     <mailto:dpreed at deepplum.com>>> wrote:
> >> >      >
> >> >      >
> >> >      >     On Monday, July 12, 2021 9:46am, "Livingood, Jason" <
> >> Jason_Livingood at comcast.com <mailto:Jason_Livingood at comcast.com>
> >> >     <mailto:Jason_Livingood at comcast.com <mailto:
> >> Jason_Livingood at comcast.com>>> said:
> >> >      >
> >> >      >      > I think latency/delay is becoming seen to be as
> important
> >> certainly, if not a more direct proxy for end user QoE. This is all
> still
> >> evolving and I
> >> >     have
> >> >      >     to say is a super interesting & fun thing to work on. :-)
> >> >      >
> >> >      >     If I could manage to sell one idea to the management
> >> hierarchy of communications industry CEOs (operators, vendors, ...) it
> is
> >> this one:
> >> >      >
> >> >      >     "It's the end-to-end latency, stupid!"
> >> >      >
> >> >      >     And I mean, by end-to-end, latency to complete a task at a
> >> relevant layer of abstraction.
> >> >      >
> >> >      >     At the link level, it's packet send to packet receive
> >> completion.
> >> >      >
> >> >      >     But at the transport level including retransmission
> buffers,
> >> it's datagram (or message) origination until the acknowledgement arrives
> >> for that
> >> >     message being
> >> >      >     delivered after whatever number of retransmissions, freeing
> >> the retransmission buffer.
> >> >      >
> >> >      >     At the WWW level, it's mouse click to display update
> >> corresponding to completion of the request.
> >> >      >
> >> >      >     What should be noted is that lower level latencies don't
> >> directly predict the magnitude of higher-level latencies. But longer
> lower
> >> level latencies
> >> >     almost
> >> >      >     always amplfify higher level latencies. Often non-linearly.
> >> >      >
> >> >      >     Throughput is very, very weakly related to these latencies,
> >> in contrast.
> >> >      >
> >> >      >     The amplification process has to do with the presence of
> >> queueing. Queueing is ALWAYS bad for latency, and throughput only helps
> if
> >> it is in exactly the
> >> >      >     right place (the so-called input queue of the bottleneck
> >> process, which is often a link, but not always).
> >> >      >
> >> >      >     Can we get that slogan into Harvard Business Review? Can we
> >> get it taught in Managerial Accounting at HBS? (which does address
> >> logistics/supply chain
> >> >     queueing).
> >> >      >
> >> >      >
> >> >      >
> >> >      >
> >> >      >
> >> >      >
> >> >      >
> >> >      > This electronic communication and the information and any files
> >> transmitted with it, or attached to it, are confidential and are
> intended
> >> solely for the
> >> >     use of
> >> >      > the individual or entity to whom it is addressed and may
> contain
> >> information that is confidential, legally privileged, protected by
> privacy
> >> laws, or
> >> >     otherwise
> >> >      > restricted from disclosure to anyone else. If you are not the
> >> intended recipient or the person responsible for delivering the e-mail
> to
> >> the intended
> >> >     recipient,
> >> >      > you are hereby notified that any use, copying, distributing,
> >> dissemination, forwarding, printing, or copying of this e-mail is
> strictly
> >> prohibited. If you
> >> >      > received this e-mail in error, please return the e-mail to the
> >> sender, delete it from your computer, and destroy any printed copy of
> it.
> >> >
> >> >
> >> >     --
> >> >     Ben Greear <greearb at candelatech.com <mailto:
> greearb at candelatech.com
> >> >>
> >> >     Candela Technologies Inc http://www.candelatech.com
> >> >
> >> >
> >> > This electronic communication and the information and any files
> >> > transmitted with it, or attached to it, are confidential and are
> >> > intended solely for the use of the individual or entity to whom it is
> >> > addressed and may contain information that is confidential, legally
> >> > privileged, protected by privacy laws, or otherwise restricted from
> >> disclosure to anyone else. If you are not the intended recipient or the
> >> person responsible for delivering the e-mail to the intended recipient,
> you
> >> are hereby notified that any use, copying, distributing, dissemination,
> >> forwarding, printing, or copying of this e-mail is strictly prohibited.
> If
> >> you received this e-mail in error, please return the e-mail to the
> sender,
> >> delete it from your computer, and destroy any printed copy of it.
> >>
> >>
> >> --
> >> Ben Greear <greearb at candelatech.com>
> >> Candela Technologies Inc  http://www.candelatech.com
> >>
> >> _______________________________________________
> >> Bloat mailing list
> >> Bloat at lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/bloat
> >>
> >>
> >
> > --
> > This electronic communication and the information and any files
> transmitted
> > with it, or attached to it, are confidential and are intended solely for
> > the use of the individual or entity to whom it is addressed and may
> contain
> > information that is confidential, legally privileged, protected by
> privacy
> > laws, or otherwise restricted from disclosure to anyone else. If you are
> > not the intended recipient or the person responsible for delivering the
> > e-mail to the intended recipient, you are hereby notified that any use,
> > copying, distributing, dissemination, forwarding, printing, or copying of
> > this e-mail is strictly prohibited. If you received this e-mail in error,
> > please return the e-mail to the sender, delete it from your computer, and
> > destroy any printed copy of it.
> >
>
>
>

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20210714/32d66ead/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4206 bytes
Desc: S/MIME Cryptographic Signature
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20210714/32d66ead/attachment-0001.bin>


More information about the Cerowrt-devel mailing list