<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

<meta name="Generator" content="Microsoft Word 15 (filtered medium)">

<style><!--

/* Font Definitions */

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

@font-face

        {font-family:"Times New Roman \(Body CS\)";

        panose-1:2 11 6 4 2 2 2 2 2 4;}

/* Style Definitions */

span.EmailStyle21

        {mso-style-type:personal-reply;

        font-family:"Calibri",sans-serif;

        color:windowtext;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-size:10.0pt;}

@page WordSection1

        {size:8.5in 11.0in;

        margin:1.0in 1.0in 1.0in 1.0in;}

div.WordSection1

        {page:WordSection1;}

--></style>

</head>

<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">

<div class="WordSection1">

<p style="margin:0in"><span style="font-size:14.0pt">> 2) Users are pissed off, because they clicked on a web page, and got nothing back. They retry on their screen, or they try another site. Meanwhile, the underlying TCP connection remains there, pumping the

 network full of more packets on that old path, which is still backed up with packets that haven't been delivered that are sitting in queues.

<o:p></o:p></span></p>

<p style="margin:0in"><span style="font-size:14.0pt"><o:p> </o:p></span></p>

<p style="margin:0in"><span style="font-size:14.0pt">Agree. I’ve experienced that as utilization of a network segment or supporting network systems (e.g. DNS) increases, you may see very small delay creep in - but not much as things are stable until they are

 *<b>quite suddenly</b>* not so. At that stability inflection point you immediately & dramatically fall off a cliff, which is then exacerbated by what you note here – user and machine-based retries/retransmissions that drives a huge increase in traffic. The

 solution has typically been throwing massive new capacity at it until the storm recedes.<o:p></o:p></span></p>

<p style="margin:0in;overflow-wrap: break-word"><span style="font-size:14.0pt"> <o:p></o:p></span></p>

<p style="margin:0in;overflow-wrap: break-word"><span style="font-size:14.0pt">> I should say that most operators, and especially ATT in this case, do not measure end-to-end latency. Instead they use Little's Lemma to query routers for their current throughput

 in bits per second, and calculate latency as if Little's Lemma applied.<o:p></o:p></span></p>

<p style="margin:0in"><span style="font-size:14.0pt"><o:p> </o:p></span></p>

<p style="margin:0in"><span style="font-size:14.0pt">IMO network operators views/practices vary widely & have been evolving quite a bit in recent years. Yes, it used to be all about capacity utilization metrics but I think that is changing. In my day job, we

 run E2E latency tests (among others) to CPE and the distribution is a lot more instructive than the mean/median to continuously improving the network experience.

<o:p></o:p></span></p>

<p style="margin:0in"><span style="font-size:14.0pt"><o:p> </o:p></span></p>

<p style="margin:0in;overflow-wrap: break-word"><span style="font-size:14.0pt">> And management responds, Hooray! Because utilization of 100% of their hardware is their investors' metric of maximizing profits. The hardware they are operating is fully utilized.

 No waste! And users are happy because no packets have been dropped!<o:p></o:p></span></p>

<p style="margin:0in;overflow-wrap: break-word"><span style="font-size:14.0pt"> <o:p></o:p></span></p>

<p style="margin:0in"><span style="font-size:14.0pt">Well, I hope it wasn’t 100% utilization meant they were ‘green’ on their network KPIs. ;-) Ha. But I think you are correct that a network engineering team would have been measured by how well they kept ahead

 of utilization/demand & network capacity decisions driven in large part by utilization trends. In a lot of networks I suspect an informal rule of thumb arose that things got a little funny once p98 utilization got to around 94-95% of link capacity – so backup

 from there to figure out when you need to trigger automatic capacity augments to avoid that. While I do not think managing via utilization in that way is incorrect, ISTM it’s mostly being used as the measure is an indirect proxy for end user QoE. I think latency/delay

 is becoming seen to be as important certainly, if not a more direct proxy for end user QoE. This is all still evolving and I have to say is a super interesting & fun thing to work on. :-)<o:p></o:p></span></p>

<p style="margin:0in"><span style="font-size:14.0pt"><o:p> </o:p></span></p>

<p style="margin:0in"><span style="font-size:14.0pt">Jason<o:p></o:p></span></p>

<p style="margin:0in"><span style="font-size:14.0pt"><o:p> </o:p></span></p>

<p style="margin:0in"><span style="font-size:14.0pt"><o:p> </o:p></span></p>

<p style="margin:0in"><span style="font-size:14.0pt"><o:p> </o:p></span></p>

<p style="margin:0in"><span style="font-size:14.0pt"><o:p> </o:p></span></p>

<p style="margin:0in"><span style="font-size:14.0pt"><o:p> </o:p></span></p>

<p style="margin:0in"><span style="font-size:14.0pt"><o:p> </o:p></span></p>

</div>

</body>

</html>