[Starlink] [NNagain] CFP march 1 - network measurement conference

Thu Dec 7 15:05:57 EST 2023

Thank you Jack, Bill and Ricky for your comments! (And everyone after!)

Jack:

"> amount (bytes, datagrams) presumed lost and re-transmitted by the sender"

I would consider those lost packets and just recovered through time
complexity e.g. retransmission with TCP and that retransmission may
require another retransmission etc. Then those re-transmitted data are
counted towards overhead.

Equivalent for UDP would be redundancy as in paying lost data recovery
with Space Complexity. If the payload can not be recovered then that
would be considered waste of the whole payload as packet loss
addition.

"> amount (bytes, datagrams) discarded at the receiver because they
were already received"

This is simply from my perspective just overhead. Any extra cost in
Space Complexity over the original payload is inefficiency and should
be just counted as such.

"> amount (bytes, datagrams) discarded at the receiver because they
arrived too late to be useful"

This is more tricky, because we are taking stance on use-case. Better
would be just to characterize in terms of time distribution as
percentiles at some specific useful intervals e.g. RTT multiples or
100ms, 500ms, 1000ms etc to give rough estimate how much time needs to
be paid to recover if feedback loop is involved.

"> With such data, it would be possible to measure things like "useful
throughput", i.e., the data successfully delivered from source to
destination which was actually useful for the associated user's
application."

Here we have few different components at play. 1. Use-case required
bandwidth, use-case consumed bandwidth. 2. Link saturation as
non-use-case specific how much of the available path are we able to
utilize effectively assuming that use case is equal or larger than
this for example for file transfer as we do not have real time budget
from the use-case perspective.

Bill:

"> All of that can probably be derived from sufficiently
finely-grained TCP data.  i.e. if you had a PCAP of a TCP flow that
constituted the measurement, you’d be able to derive all of the
above."

What I would say that cannot be derived is the behavior of transport
considering these combinations. What was very interesting, and what I
have also experienced was here:
https://www.youtube.com/watch?v=XHls8PvCVws&t=319s

Specially the second talk about measurements at fine grained payload
resolution. You see a very specific pattern on TCP when the payload
size increases you will introduce extra RTT.

Thus I think what is done for example in iperf is not exactly good
method as there is just 1 payload size (?) and this result shows that
the payload size itself will determine your results. So we need more
dimensions in our tests.

"> Bandwidth: The data transfer capacity available on the test path.
Presumably the goal of a TCP transaction measurement would be to
enable this calculation."

This in the light of the previous video e.g. "we know X is available,
how was it utilized?"

"> Transfer Efficiency: The ratio of useful payload data to the overhead data.
This is a how-its-used rather than a property-of-the-network.  If
there are network-inherent overheads, they’re likely to be not
directly visible to endpoints, only inferable, and might require
external knowledge of the network.  So, I’d put this out-of-scope."

I think differently, this is extremely important factor, and it is
getting more and more important. How much waste there is to useful
data. Every single extra bit is waste. Is it redundant data for
recovery on non-feedbackloop utilizing transports or just ACK for
retransmission the goal is same, we need to construct the payload,
either by paying in time budget (wait for rentransmission) or space
budget (add enough redundancy to make sure we can recreate regardless
of lost data).

"> RTT is measurable.  If Latency is RTT minus processing delay on the
remote end, I’m not sure it’s really measurable, per se, without the
remote end being able to accurately clock itself, or an independent
vantage point adjacent to the remote end.  This is the old
one-way-delay measurement problem in different guise, I think.
Anyway, I think RTT is easy and necessary, and I think latency is
difficult and probably an anchor not worth attaching to anything we
want to see done in the near term.  Latency jitter likewise."

Yes, this is difficult without time sync. That is why it has to be
laboratory conditions or glass to glass across relay. Very very tricky
to make sure the timings are correct to no garbage measurements!

For truthfully understanding performance from gradient of use-cases we
need to be able to distinguish the baseline network fluctuation (e.g.
Starlink) and what delays on top of that are caused by our
transport.Thus I would really like to keep these 2 separate, as they
will allow us to distinguish did we spend extra 300ms on few
retransmission loops or did we get extra 5ms latency just as a error
from the gradient in the baseline.

"> This seems like it can be derived from a PCAP, but doesn’t really
constitute an independent measurement."

Agree, if we condition the network for lab test, it would be good to
be able to derive the conditioned loss to make sure we know what we
are doing.

"> Energy Efficiency: The amount of energy consumed to achieve the test result.
Not measurable."

We should try! See attached picture about QUIC, just from the previous
email, these things matter when we have constrained devices in IoT or
space, or drones etc. We can get artificially good performance if we
just burn enormous amount of energy. We should try to make sure we
measure total energy spent on the overall transmission. Even if
difficult, we should absolutely worry about the cost not only in
bandwidth, time but also energy as consequence of computational
complexity of some approaches at our disposal even for error
correction.

"> Did I overlook something?
Out-of-order delivery is the fourth classical quality criterion.
There are folks who argue that it doesn’t matter anymore, and others
who (more compellingly, to my mind) argue that it’s at least as
relevant as ever."

Good point, is this concern for receive buffer size bloating if we
need to make jitter buffer and pay in time complexity to recover
misordered data? My perspective is mostly on application layer so
forgive if I missed your point.

"Thus, for an actual measurement suite:
 - A TCP transaction
…from which we can observe:
 - Loss
 - RTT (which I’ll just call “Latency” because that’s what people have
called it in the past)
 - out-of-order delivery
 - Jitter in the above three, if the transaction continues long enough
…and we can calculate:
 - Goodput"

I see, yes, my comments would have been doing all that
TCP/LTP/UDP/QUIC... whatnot to have finally good overview where we
stand as humanity on our ability to transfer bits for arbitrary
use-case with quadrant on data intensity X axis and event rate on Y
axis followed by environmental factors as latency and packetloss.

"In addition to these, I think it’s necessary to also associate a
traceroute (and, if available and reliable, a reverse-path traceroute)
in order that it be clear what was measured, and a timestamp, and a
digital signature over the whole thing, so we can know who’s attesting
to the measurement."

Yes, specially if we do tests in the wilderness, fully agreed!

Overall all tests should be so accurately documented that they can be
reproduced within error margins. If not, its not really science or
even engineering, just artisans with rules of thumb!

Best regards,
Sauli

On 12/7/23, Ricky Mok via Starlink <starlink at lists.bufferbloat.net> wrote:
> How about applications? youtube and netflix?
>
> (TPC of this conference this year)
>
> Ricky
>
> On 12/6/23 18:22, Bill Woodcock via Starlink wrote:
>>
>>> On Dec 6, 2023, at 22:46, Sauli Kiviranta via Nnagain
>>> <nnagain at lists.bufferbloat.net> wrote:
>>> What would be a comprehensive measurement? Should cover all/most relevant
>>> areas?
>> It’s easy to specify a suite of measurements which is too heavy to be
>> easily implemented or supported on the network.  Also, as you point out,
>> many things can be derived from raw data, so don’t necessarily require
>> additional specific measurements.
>>
>>> Payload Size: The size of data being transmitted.
>>> Event Rate: The frequency at which payloads are transmitted.
>>> Bitrate: The combination of rate and size transferred in a given test.
>>> Throughput: The data transfer capability achieved on the test path.
>> All of that can probably be derived from sufficiently finely-grained TCP
>> data.  i.e. if you had a PCAP of a TCP flow that constituted the
>> measurement, you’d be able to derive all of the above.
>>
>>> Bandwidth: The data transfer capacity available on the test path.
>> Presumably the goal of a TCP transaction measurement would be to enable
>> this calculation.
>>
>>> Transfer Efficiency: The ratio of useful payload data to the overhead
>>> data.
>> This is a how-its-used rather than a property-of-the-network.  If there
>> are network-inherent overheads, they’re likely to be not directly visible
>> to endpoints, only inferable, and might require external knowledge of the
>> network.  So, I’d put this out-of-scope.
>>
>>> Round-Trip Time (RTT): The ping delay time to the target server and back.
>>> RTT Jitter: The variation in the delay of round-trip time.
>>> Latency: The transmission delay time to the target server and back.
>>> Latency Jitter: The variation in delay of latency.
>> RTT is measurable.  If Latency is RTT minus processing delay on the remote
>> end, I’m not sure it’s really measurable, per se, without the remote end
>> being able to accurately clock itself, or an independent vantage point
>> adjacent to the remote end.  This is the old one-way-delay measurement
>> problem in different guise, I think.  Anyway, I think RTT is easy and
>> necessary, and I think latency is difficult and probably an anchor not
>> worth attaching to anything we want to see done in the near term.  Latency
>> jitter likewise.
>>
>>> Bit Error Rate: The corrupted bits as a percentage of the total
>>> transmitted data.
>> This seems like it can be derived from a PCAP, but doesn’t really
>> constitute an independent measurement.
>>
>>> Packet Loss: The percentage of packets lost that needed to be recovered.
>> Yep.
>>
>>> Energy Efficiency: The amount of energy consumed to achieve the test
>>> result.
>> Not measurable.
>>
>>> Did I overlook something?
>> Out-of-order delivery is the fourth classical quality criterion.  There
>> are folks who argue that it doesn’t matter anymore, and others who (more
>> compellingly, to my mind) argue that it’s at least as relevant as ever.
>>
>> Thus, for an actual measurement suite:
>>
>>   - A TCP transaction
>>
>> …from which we can observe:
>>
>>   - Loss
>>   - RTT (which I’ll just call “Latency” because that’s what people have
>> called it in the past)
>>   - out-of-order delivery
>>   - Jitter in the above three, if the transaction continues long enough
>>
>> …and we can calculate:
>>
>>   - Goodput
>>
>> In addition to these, I think it’s necessary to also associate a
>> traceroute (and, if available and reliable, a reverse-path traceroute) in
>> order that it be clear what was measured, and a timestamp, and a digital
>> signature over the whole thing, so we can know who’s attesting to the
>> measurement.
>>
>>                                  -Bill
>>
>>
>> _______________________________________________
>> Starlink mailing list
>> Starlink at lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/starlink
> _______________________________________________
> Starlink mailing list
> Starlink at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink
>