[Rpm] Alternate definitions of "working condition" - unnecessary?

Christoph Paasch cpaasch at apple.com
Mon Oct 11 13:34:23 EDT 2021


On 10/11/21 - 09:31, Sebastian Moeller wrote:
> > On Oct 9, 2021, at 01:32, Christoph Paasch <cpaasch at apple.com> wrote:
> > 
> > On 10/07/21 - 12:30, Sebastian Moeller wrote:
> >> Hi Christoph,
> >> 
> >>> On Oct 7, 2021, at 02:11, Christoph Paasch via Rpm
> >>> <rpm at lists.bufferbloat.net> wrote:
> >>> 
> >>> On 10/07/21 - 02:18, Jonathan Morton via Rpm wrote:
> >>>>> On 7 Oct, 2021, at 12:22 am, Dave Taht via Rpm
> >>>>> <rpm at lists.bufferbloat.net> wrote: There are additional cases where,
> >>>>> perhaps, the fq component works, and the aqm doesn't.
> >>>> 
> >>>> Such as Apple's version of FQ-Codel?  The source code is public, so we
> >>>> might as well talk about it.
> >>> 
> >>> Let's not just talk about it, but actually read it ;-)
> >>> 
> >>>> There are two deviations I know about in the AQM portion of that.
> >>>> First is that they do the marking and/or dropping at the tail of the
> >>>> queue, not the head.  Second is that the marking/dropping frequency is
> >>>> fixed, instead of increasing during a continuous period of congestion
> >>>> as real Codel does.
> >>> 
> >>> We don't drop/mark locally generated traffic (which is the use-case we
> >>> care abhout).
> >> 
> >> 	In this discussion probably true, but I recall that one reason why
> >> 	sch_fq_codel is a more versatile qdisc compared to sch_fq under
> >> 	Linux is that fq excels for locally generated traffic, while
> >> 	fq_codel is also working well for forwarded traffic. And I use
> >> 	"forwarding" here to encompass things like VMs running on a host,
> >> 	where direct "back-pressure" will not work... 
> > 
> > Our main use-case is iOS. This is by far the most common case and thus there
> > are no VMs or alike. All traffic is generated locally by our TCP
> > implementation.
> 
> 	Ah, explains your priorities.

Yes - we are aware of these issues for forwarding or VM-generated traffic.

But the amount of traffic is so much lower compared to the other use-cases
that it is not even a drop in the bucket.

> My only iOS device is 11 years old, and as far as I understand does not support fq_codel at all, so my testing is restricted to my macbooks and was under mojave and catalina:
> 
> macbook:~ user$ sudo netstat -I en4 -qq
> en4:
>      [ sched:  FQ_CODEL  qlength:    0/128 ]
>      [ pkts:     126518  bytes:   60151318  dropped pkts:      0 bytes:      0 ]
> =====================================================
>      [ pri: CTL (0)	srv_cl: 0x480190	quantum: 600	drr_max: 8 ]
>      [ queued pkts: 0	bytes: 0 ]
>      [ dequeued pkts: 16969	bytes: 1144841 ]
>      [ budget: 0	target qdelay:  5.00 msec	update interval:100.00 msec ]
>      [ flow control: 0	feedback: 0	stalls: 0	failed: 0 ]
>      [ drop overflow: 0	early: 0	memfail: 0	duprexmt:0 ]
>      [ flows total: 0	new: 0	old: 0 ]
>      [ throttle on: 0	off: 0	drop: 0 ]
> =====================================================
>      [ pri: VO (1)	srv_cl: 0x400180	quantum: 600	drr_max: 8 ]
>      [ queued pkts: 0	bytes: 0 ]
>      [ dequeued pkts: 0	bytes: 0 ]
>      [ budget: 0	target qdelay:  5.00 msec	update interval:100.00 msec ]
>      [ flow control: 0	feedback: 0	stalls: 0	failed: 0 ]
>      [ drop overflow: 0	early: 0	memfail: 0	duprexmt:0 ]
>      [ flows total: 0	new: 0	old: 0 ]
>      [ throttle on: 0	off: 0	drop: 0 ]
> =====================================================
>      [ pri: VI (2)	srv_cl: 0x380100	quantum: 3000	drr_max: 6 ]
>      [ queued pkts: 0	bytes: 0 ]
>      [ dequeued pkts: 0	bytes: 0 ]
>      [ budget: 0	target qdelay:  5.00 msec	update interval:100.00 msec ]
>      [ flow control: 0	feedback: 0	stalls: 0	failed: 0 ]
>      [ drop overflow: 0	early: 0	memfail: 0	duprexmt:0 ]
>      [ flows total: 0	new: 0	old: 0 ]
>      [ throttle on: 0	off: 0	drop: 0 ]
> =====================================================
>      [ pri: RV (3)	srv_cl: 0x300110	quantum: 3000	drr_max: 6 ]
>      [ queued pkts: 0	bytes: 0 ]
>      [ dequeued pkts: 0	bytes: 0 ]
>      [ budget: 0	target qdelay:  5.00 msec	update interval:100.00 msec ]
>      [ flow control: 0	feedback: 0	stalls: 0	failed: 0 ]
>      [ drop overflow: 0	early: 0	memfail: 0	duprexmt:0 ]
>      [ flows total: 0	new: 0	old: 0 ]
>      [ throttle on: 0	off: 0	drop: 0 ]
> =====================================================
>      [ pri: AV (4)	srv_cl: 0x280120	quantum: 3000	drr_max: 6 ]
>      [ queued pkts: 0	bytes: 0 ]
>      [ dequeued pkts: 0	bytes: 0 ]
>      [ budget: 0	target qdelay:  5.00 msec	update interval:100.00 msec ]
>      [ flow control: 0	feedback: 0	stalls: 0	failed: 0 ]
>      [ drop overflow: 0	early: 0	memfail: 0	duprexmt:0 ]
>      [ flows total: 0	new: 0	old: 0 ]
>      [ throttle on: 0	off: 0	drop: 0 ]
> =====================================================
>      [ pri: OAM (5)	srv_cl: 0x200020	quantum: 1500	drr_max: 4 ]
>      [ queued pkts: 0	bytes: 0 ]
>      [ dequeued pkts: 0	bytes: 0 ]
>      [ budget: 0	target qdelay:  5.00 msec	update interval:100.00 msec ]
>      [ flow control: 0	feedback: 0	stalls: 0	failed: 0 ]
>      [ drop overflow: 0	early: 0	memfail: 0	duprexmt:0 ]
>      [ flows total: 0	new: 0	old: 0 ]
>      [ throttle on: 0	off: 0	drop: 0 ]
> =====================================================
>      [ pri: RD (6)	srv_cl: 0x180010	quantum: 1500	drr_max: 4 ]
>      [ queued pkts: 0	bytes: 0 ]
>      [ dequeued pkts: 78	bytes: 13943 ]
>      [ budget: 0	target qdelay:  5.00 msec	update interval:100.00 msec ]
>      [ flow control: 0	feedback: 0	stalls: 0	failed: 0 ]
>      [ drop overflow: 0	early: 0	memfail: 0	duprexmt:0 ]
>      [ flows total: 0	new: 0	old: 0 ]
>      [ throttle on: 0	off: 0	drop: 0 ]
> =====================================================
>      [ pri: BE (7)	srv_cl: 0x0	quantum: 1500	drr_max: 4 ]
>      [ queued pkts: 0	bytes: 0 ]
>      [ dequeued pkts: 98857	bytes: 56860512 ]
>      [ budget: 0	target qdelay:  5.00 msec	update interval:100.00 msec ]
>      [ flow control: 0	feedback: 0	stalls: 0	failed: 0 ]
>      [ drop overflow: 0	early: 0	memfail: 0	duprexmt:0 ]
>      [ flows total: 0	new: 0	old: 0 ]
>      [ throttle on: 0	off: 0	drop: 0 ]
> =====================================================
>      [ pri: BK (8)	srv_cl: 0x100080	quantum: 1500	drr_max: 2 ]
>      [ queued pkts: 0	bytes: 0 ]
>      [ dequeued pkts: 10565	bytes: 2126520 ]
>      [ budget: 0	target qdelay:  5.00 msec	update interval:100.00 msec ]
>      [ flow control: 0	feedback: 0	stalls: 0	failed: 0 ]
>      [ drop overflow: 0	early: 0	memfail: 0	duprexmt:0 ]
>      [ flows total: 0	new: 0	old: 0 ]
>      [ throttle on: 0	off: 0	drop: 0 ]
> =====================================================
>      [ pri: BK_SYS (9)	srv_cl: 0x80090	quantum: 1500	drr_max: 2 ]
>      [ queued pkts: 0	bytes: 0 ]
>      [ dequeued pkts: 49	bytes: 5502 ]
>      [ budget: 0	target qdelay:  5.00 msec	update interval:100.00 msec ]
>      [ flow control: 0	feedback: 0	stalls: 0	failed: 0 ]
>      [ drop overflow: 0	early: 0	memfail: 0	duprexmt:0 ]
>      [ flows total: 0	new: 0	old: 0 ]
>      [ throttle on: 0	off: 0	drop: 0 ]
> hms-beagle2:~ smoeller$ 
> 
> macbook:~ user$ 
> 
> 
> 
> 
> 
> > 
> >>>> We signal flow-control straight back to the TCP-stack at which point the
> >>> queue is entirely drained before TCP starts transmitting again.
> >>> 
> >>> So, drop-frequency really doesn't matter because there is no drop.
> >> 
> >> 	But is it still codel/fq_codel if it does not implement head drop
> >> 	(as described in
> >> 	https://datatracker.ietf.org/doc/html/rfc8290#section-4.2) and if
> >> 	the control loop
> >> 	(https://datatracker.ietf.org/doc/html/rfc8289#section-3.3) is
> >> 	changed? (I am also wondering how reducing the default number of
> >> 	sub-queues from 1024 to 128 behaves on the background of the
> >> 	birthday paradox).
> > 
> > Not sure where the 128 comes from ?
> 
> See above:
>      [ sched:  FQ_CODEL  qlength:    0/128 ]
> but I might simply be misinterpreting the number here, because reading this again instead of relaying on memory indicates that 128 is the length of each individual queue and not the number of queues? Or is that the length of the hardware queue sitting below fq_codel? 

This 128 can be safely ignored. It has no meaning :)

> Anyway, any hints how to query/configure the fq_codel instance under macos (I am not fluent in BSDeses)?

For querying, netstat -qq as you already saw.

There is not much to configure... Just two sysctls:

net.classq.target_qdelay: 0
net.classq.update_interval: 0

0 means that the system's default is in use.


Christoph

> > And birthday paradox does not apply. The magic happens in inp_calc_flowhash() ;-)
> 
> 	Thanks will need to spend time reading and understanding the code obviously.
> 
> Regards
> 	Sebastian
> 
> > 
> > 
> > Cheers,
> > Christoph
> > 
> > 
> >> Best Regards Sebastian
> >> 
> >> P.S.: My definition of working conditions entails bidirectionally
> >> saturating traffic with responsive and (transiently) under-responsive
> >> flows. Something like a few long running TCP transfers to generate
> >> "base-load" and a higher number of TCP flows in IW or slow start to add
> >> some spice to the whole. In the future, once QUIC actually takes off*,
> >> adding more well defined/behaved UDP flows to the mix seems reasonable. My
> >> off the cuff test for the effect of IW used to be to start a browser and
> >> open a collection of (30-50) tabs getting a nice "thundering herd" of TCP
> >> flows starting around the same time. But it seems browser makers got too
> >> smart for me and will not do what I want any more but temporally space the
> >> different sites in the tabs so that my nice thundering herd is less
> >> obnoxious (which IMHO is actually the right thing to do for actual usage,
> >> but for testing it sucks).
> >> 
> >> *) Occasionally browsing the NANOG archives makes me wonder how the move
> >> from HTTP/TCP to  QUIC/UDP is going to play with operators propensity to
> >> rate-limit UDP, but that is a different kettle of fish...
> >> 
> >> 
> >>> 
> >>> 
> >>> Christoph
> >>> 
> >>>> 
> >>>> I predict the consequences of these mistakes will differ according to
> >>>> the type of traffic applied:
> >>>> 
> >>>> With TCP traffic over an Internet-scale path, the consequences are not
> >>>> serious.  The tail-drop means that the response at the end of
> >>>> slow-start will be slower, with a higher peak of intra-flow induced
> >>>> delay, and there is also a small but measurable risk of tail-loss
> >>>> causing a more serious application-level delay.  These alone *should*
> >>>> be enough to prompt a fix, if Apple are actually serious about
> >>>> improving application responsiveness.  The fixed marking frequency,
> >>>> however, is probably invisible for this traffic.
> >>>> 
> >>>> With TCP traffic over a short-RTT path, the effects are more
> >>>> pronounced.  The delay excursion at the end of slow-start will be
> >>>> larger in comparison to the baseline RTT, and when the latter is short
> >>>> enough, the fixed congestion signalling frequency means there will be
> >>>> some standing queue that real Codel would get rid of.  This standing
> >>>> queue will influence the TCP stack's RTT estimator and thus RTO value,
> >>>> increasing the delay consequent to tail loss.
> >>>> 
> >>>> Similar effects to the above can be expected with other reliable stream
> >>>> transports (SCTP, QUIC), though the details may differ.
> >>>> 
> >>>> The consequences with non-congestion-controlled traffic could be much
> >>>> more serious.  Real Codel will increase its drop frequency continuously
> >>>> when faced with overload, eventually gaining control of the queue depth
> >>>> as long as the load remains finite and reasonably constant.  Because
> >>>> Apple's AQM doesn't increase its drop frequency, the queue depth for
> >>>> such a flow will increase continuously until either a delay-sensitive
> >>>> rate selection mechanism is triggered at the sender, or the queue
> >>>> overflows and triggers burst losses.
> >>>> 
> >>>> So in the context of this discussion, is it worth generating a type of
> >>>> load that specifically exercises this failure mode?  If so, what does
> >>>> it look like?
> >>>> 
> >>>> - Jonathan Morton _______________________________________________ Rpm
> >>>> mailing list Rpm at lists.bufferbloat.net
> >>>> https://lists.bufferbloat.net/listinfo/rpm
> >>> _______________________________________________ Rpm mailing list
> >>> Rpm at lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/rpm
> 


More information about the Rpm mailing list