[Bloat] Fwd: Seeking input from engineers with expertise in video conferencing and similar delay-sensitive applications

Dave Taht dave.taht at gmail.com
Mon Oct 28 15:22:56 EDT 2024


If anyone with a videoconferencing background can comment on these
methods (to the ippm list, not here)

---------- Forwarded message ---------
From: Stuart Cheshire <cheshire=40apple.com at dmarc.ietf.org>
Date: Mon, Oct 28, 2024 at 12:21 PM
Subject: Seeking input from engineers with expertise in video
conferencing and similar delay-sensitive applications
To: <ietf at ietf.org>


Hello IETF colleagues,

The IETF has been working on L4S, to reduce packet loss and delay on
the Internet, which should be a great benefit for delay-sensitive
applications like video conferencing.

In a companion project, we are working on a network measurement tool
to report meaningful delay measurements, to validate whether L4S
deployments and other similar technologies are actually delivering
what they promise.

We are seeking expert feedback on this Internet Draft:

<https://datatracker.ietf.org/doc/html/draft-ietf-ippm-responsiveness>

It will be discussed next Monday at the IETF meeting in Dublin:

<https://datatracker.ietf.org/meeting/121/materials/agenda-121-ippm>

The purpose of this work is to create a repeatable analytical test
that can be run to assess how well a network will support
delay-sensitive applications like video conferencing. For the test to
be useful, the results it reports need to correlate with subjective
user experience. I worry that we have not validated this aspect of the
test enough.

My understanding is that video conferencing applications accumulate
received packets in a playback buffer (to smooth out delay variation),
and then determine a time when those packets are decoded to display a
frame. Setting the playback buffer too deep results in conversational
delay that impacts user experience. Setting the playback buffer too
shallow results in lower delay, but risks displaying a frame before
all the necessary packets have been received, degrading image (and
audio) quality. Thus the playback buffer needs to dynamically adjust
to network conditions, to balance between playing early enough to keep
conversational delay low, but late enough that a sufficient percentage
of packets have been received by the playback time.

How does a video conferencing application compute this ideal playback
delay? Is the delay set such that we expect 90% of the necessary
packets should have been received? 95%? 99%?

The draft has been through a series of revisions with input from
multiple people. It has currently arrived at an algorithm that samples
the application-layer round-trip delay over a period of about ten
seconds, discards the worst 5% of those measurements, and reports the
arithmetic mean of the the best 95%.

Is this a good predictor of video conferencing performance? I fear
that our current test may be measuring the exact opposite of what
video conferencing cares about. Mean and median mean nothing to video
conferencing. If the median round-trip delay is just 1ms then that’s
awesome, but it does a video conferencing application no good to
decode a frame when it’s got only half the packets (that’s what median
means). If the 90th percentile round-trip delay is 500ms, and the
application needs to have 90% of the packets before it can usefully
decode a frame, then the application needs to wait that long before
decoding a frame. It doesn’t matter if half the packets arrive really
early, if the remaining necessary packets arrive late. It is the
latecomers that determine the playback delay, not the early packets.

Does my reasoning make sense here? What metric would video
conferencing applications like to see reported? 90th percentile? 95th
percentile? 99th percentile? Something else?

I want to make sure that when we publish this Internet Draft as an
IETF RFC it serves its purpose of motivating vendors and operators to
tune their networks so that delay-sensitive applications work well. If
the test measures the wrong thing, then it motivates vendors and
operators to optimize the wrong thing, and that doesn’t help
delay-sensitive applications like video conferencing work better.

Please send comments to ippm at ietf.org <mailto:ippm at ietf.org>, or
attend IPPM in Dublin to share your thoughts in person.

Stuart Cheshire



-- 
Dave Täht CSO, LibreQos


More information about the Bloat mailing list