[Ecn-sane] [tsvwg] per-flow scheduling

Kyle Rose krose at krose.org
Sat Jul 27 11:35:28 EDT 2019


Right, I understand that under 3168 behavior the sender would react
differently to ECE markings than L4S flows would, but I guess I don't
understand why a sender willing to misclassify traffic with ECT(1) wouldn't
also choose to react non-normatively to ECE markings. On the rest, I think
we agree.

Kyle

On Thu, Jul 25, 2019 at 3:26 PM Holland, Jake <jholland at akamai.com> wrote:

> Hi Kyle,
>
>
>
> I almost agree, except that the concern is not about classic flows.
>
>
>
> I agree (with caveats) with what Bob and Greg have said before: ordinary
> classic flows don’t have an incentive to mis-mark if they’ll be responding
> normally to CE, because a classic flow will back off too aggressively and
> starve itself if it’s getting CE marks from the LL queue.
>
>
>
> That said, I had a message where I tried to express something similar to
> the concerns I think you just raised, with regard to a different category
> of flow:
>
> https://mailarchive.ietf.org/arch/msg/tsvwg/bUu7pLmQo6BhR1mE2suJPPluW3Q
>
>
>
> So I agree with the concerns you’ve raised here, and I want to +1 that
> aspect of it while also correcting that I don’t think these apply for
> ordinary classic flows, but rather for flows that use application-level
> quality metrics to change bit-rates instead responding at the transport
> level.
>
>
>
> For those flows (which seems to include some of today’s video conferencing
> traffic), I expect they really would see an advantage by mis-marking
> themselves, and will require policing that imposes a policy decision.
> Given that, I agree that I don’t see a simple alternative to FQ for flows
> originating outside the policer’s trust domain when the network is fully
> utilized.
>
>
>
> I hope that makes at least a little sense.
>
>
>
> Best regards,
>
> Jake
>
>
>
> *From: *Kyle Rose <krose at krose.org>
> *Date: *2019-07-23 at 11:13
> *To: *Bob Briscoe <ietf at bobbriscoe.net>
> *Cc: *"ecn-sane at lists.bufferbloat.net" <ecn-sane at lists.bufferbloat.net>,
> tsvwg IETF list <tsvwg at ietf.org>, "David P. Reed" <dpreed at deepplum.com>
> *Subject: *Re: [tsvwg] [Ecn-sane] per-flow scheduling
>
>
>
> On Mon, Jul 22, 2019 at 9:44 AM Bob Briscoe <ietf at bobbriscoe.net> wrote:
>
> Folks,
>
> As promised, I've pulled together and uploaded the main architectural
> arguments about per-flow scheduling that cause concern:
>
> Per-Flow Scheduling and the End-to-End Argum ent
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__bobbriscoe.net_projects_latency_per-2Dflow-5Ftr.pdf&d=DwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=bqnFROivDo_4iF8Z3R4DyNWKbbMeXr0LOgLnElT1Ook&m=PI1HWa27sXLOTKR6A5e3p0PaPt7vS4SMNHQKYIzfXxM&s=ACtkb-7e-7Ifb6QsnMjd4WSYrCfUyWGIbBuNkDZ8V3E&e=>
>
>
> It runs to 6 pages of reading. But I tried to make the time readers will
> have to spend worth it.
>
>
>
> Before reading the other responses (poisoning my own thinking), I wanted
> to offer my own reaction. In the discussion of figure 1, you seem to imply
> that there's some obvious choice of bin packing for the flows involved, but
> that can't be right. What if the dark green flow has deadlines? Why should
> that be the one that gets only leftover bandwidth? I'll return to this
> point in a bit.
>
>
>
> The tl;dr summary of the paper seems to be that the L4S approach leaves
> the allocation of limited bandwidth up to the endpoints, while FQ
> arbitrarily enforces equality in the presence of limited bandwidth; but in
> reality the bottleneck device needs to make *some* choice when there's a
> shortage and flows don't respond. That requires some choice of policy.
>
>
>
> In FQ, the chosen policy is to make sure every flow has the ability to get
> low latency for itself, but in the absence of some other kind of trusted
> signaling allocates an equal proportion of the available bandwidth to each
> flow. ISTM this is the best you can do in an adversarial environment,
> because anything else can be gamed to get a more than equal share (and
> depending on how "flow" is defined, even this can be gamed by opening up
> more flows; but this is not a problem unique to FQ).
>
>
>
> In L4S, the policy is to assume one queue is well-behaved and one not, and
> to use the ECT(1) codepoint as a classifier to get into one or the other.
> But policy choice doesn't end there: in an uncooperative or adversarial
> environment, you can easily get into a situation in which the bottleneck
> has to apply policy to several unresponsive flows in the supposedly
> well-behaved queue. Note that this doesn't even have to involve bad actors
> misclassifying on purpose: it could be two uncooperative 200 Mb VR flows
> competing for 300 Mb of bandwidth. In this case, L4S falls back to classic,
> which with DualQ means every flow, not just the uncooperative ones,
> suffers. As a user, I don't want my small, responsive flows to suffer when
> uncooperative actors decide to exceed the BBW.
>
>
>
> Getting back to figure 1, how do you choose the right allocation? With the
> proposed use of ECT(1) as classifier, you have exactly one bit available to
> decide which queue, and therefore which policy, applies to a flow. Should
> all the classic flows get assigned whatever is left after the L4S flows are
> allocated bandwidth? That hardly seems fair to classic flows. But let's say
> this policy is implemented. It then escapes me how this is any different
> from the trust problems facing end-to-end DSCP/QoS: why wouldn't everyone
> just classify their classic flows as L4S, forcing everything to be treated
> as classic and getting access to a (greater) share of the overall BBW? Then
> we're left both with a spent ECT(1) codepoint and a need for FQ or some
> other queuing policy to arbitrate between flows, without any bits with
> which to implement the high-fidelity congestion signal required to achieve
> low latency without getting squeezed out.
>
>
>
> The bottom line is that I see no way to escape the necessity of something
> FQ-like at bottlenecks outside of the sender's trust domain. If FQ can't be
> done in backbone-grade hardware, then the only real answer is pipes in the
> core big enough to force the bottleneck to live somewhere closer to the
> edge, where FQ does scale.
>
>
>
> Note that, in a perfect world, FQ wouldn't trigger at all because there
> would always be enough bandwidth for everything users wanted to do, but in
> the real world it seems like the best you can possibly do in the absence of
> trusted information about how to prioritize traffic. IMO, best to think of
> FQ as a last-ditch measure indicating to the operator that they're gonna
> need a bigger pipe than as a steady-state bandwidth allocator.
>
>
>
> Kyle
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/ecn-sane/attachments/20190727/d2ee5c77/attachment.html>


More information about the Ecn-sane mailing list