[Cake] Fwd: [aqm] draft-ietf-aqm-pie-01: review

Sat May 9 13:52:03 EDT 2015

quite a few interesting points here, a review of bob's old comments on
codel seems like a good idea.

---------- Forwarded message ----------
From: Bob Briscoe <bob.briscoe at bt.com>
Date: Sat, May 9, 2015 at 10:13 AM
Subject: Re: [aqm] draft-ietf-aqm-pie-01: review
To: Dave Taht <dave.taht at gmail.com>
Cc: Richard Scheffenegger <rs at netapp.com>, "Eddy Wesley M. [VZ]"
<Wesley.M.Eddy at nasa.gov>, draft-ietf-aqm-pie at tools.ietf.org, AQM IETF
list <aqm at ietf.org>

Dave,

At 00:51 09/05/2015, Dave Taht wrote:
>
> Dear Bob:
>
>  I now understand the linux codebase for pie a lot better, as well as
> some of the experimental data I have. It  looks like I could make
> several of the changes you describe and put them in my next series of
> tests, and based on your parameters I should be able to exercise some
> edge cases across those changes. Wow, thx!

I'd like to allow the authors a chance to think and respond before
anyone does any changes to implementations - they may have their
reasons that I didn't understand.

> I have not actually read the latest pie draft, but would like to make
> a few comments on your comments quickly:
>
> re: 3.1: in linux, params->ecn is presently a boolean, and could
> easily be modified to mean anything and compare anything you want.
> What would be a good default?
>
> The ECN support in the linux code on enqueue looks like:
>
>         if (!drop_early(sch, skb->len)) {
>                 enqueue = true;
>         } else if (q->params.ecn && (q->vars.prob <= MAX_PROB / 10) &&
>
>                    INET_ECN_set_ce(skb)) {
>
>                 /* If packet is ecn capable, mark it if drop probability
>
>                  * is lower than 10%, else drop it.
>
>                  */
>                 ....

Next mail I'll send an email I sent to Fred offlist on this subject.

> Re: 5.0: will look over that code

My review didn't have a section 5.0
:?

> re: 5.1,  linux code:
>
>         /* Non-linear drop in probability: Reduce drop probability quickly if
>          * delay is 0 for 2 consecutive Tupdate periods.
>          */
>
>         if ((qdelay == 0) && (qdelay_old == 0) && update_prob)
>                 q->vars.prob = (q->vars.prob * 98) / 100;

Ooh joy. The great thing about open source is that it allows everyone
to assert their human right to add their own made-up numbers.
<http://dilbert.com/strip/2008-05-08>

Looking at the Linux code, I just found another magic number has been added:
<https://github.com/torvalds/linux/blob/master/net/sched/sch_pie.c#L368>
/* to ensure we increase probability in steps of no more than 2% */
:-!

There really is no point in employing control theory experts to work
out the stability limits for how fast a control algorithm can respond,
if implementers then add in extra arbitrary limits on the
responsiveness like this.

> re: 5.2: strongly agree that the lookup table doesn't scale properly.
> The linux code appears to differ from the draft also, here, with a
> smaller lookup table, and some other smoothing functions. I am going
> to stop pasting now and just point at:
> https://github.com/torvalds/linux/blob/master/net/sched/sch_pie.c#L334
>
> Will await more feedback from y'all on that.

The factors I used in my review were added to the PIE draft when the
pseudocode was added in Oct'14.
The numbers in the Linux code you point to above match the original
HPSR'13 paper on PIE.
We need to wait to hear the answer from the authors about why the
numbers were changed.

> Codel also has a similar undershoot problem for which I proposed we
> try a fixed point fractional count variable in a recent post to the
> cake mailing list.

Back in 2013 I pointed out that the CoDel square-root control law ends
up increasing drop probability linearly with time. See my points 3 & 4
here:
<http://www.ietf.org/mail-archive/web/aqm/current/msg00376.html>

> re: 5.3.1: In the environments i work in it is extremely hard to get
> timers to reliably fire in under 2ms intervals, particularly on vm'd
> systems. Also as you fire the timer more rapidly the current
> calculations in pie now done out of band of the packet processing have
> a couple divides in them which tend to be processor intensive... both
> things said, i figure this and other implementations could fire faster
> than the default 16ms...

My point was that an RFC should say what the algorithm is trying to
achieve, not solely what a particular implementation for a particular
environment might have to do as a compromise.

> re: 5.3.2: I like what you are saying but I gotta go work it out for
> myself. which will take a while. patches wanted.

I'm not sure whether the inaccuracy is significant. I tried to find
cases where it might be, but gave up wasting time using trial and
error - I need to think about it analytically.

> re: 5.4: linux and all my tests have always been against:
>            /* default of 100 ms in pschedtime */
>            vars->burst_time = PSCHED_NS2TICKS(100 * NSEC_PER_MSEC);

I think it could be zero, given the alpha/beta formula already
significantly filters out bursts. But again, wait for the authors to
answer my question about whether they chose this value based on
experiments.

> 5.5: explains a lot. Probably. Will think on it.
>
> 5.6: :chortle: heats_this_room() indeed! Derandomization always looked
> to me like an overly complex solution to a non-problem.
>
> 5.7: don't think this problem exists in the linux code but will step through.

It does:
<https://github.com/torvalds/linux/blob/master/net/sched/sch_pie.c#L383>

> But: in one of my recent (450 rrul_50_up tcp flows) tests neither
> codel or pie got to a sane drop rate in under 60 seconds, and pie
> stayed stuck at 10000 packets outstanding, i did not try more - on
> gigE local links. I think a lot of pie tests were run by others with a
> very low outside packet limit (200 packets) and thus the tail drop
> kicked in before pie itself could react.

That's the opposite to my concern in 5.7 - I'm sure tail drop will
often kick in before PIE can react (see my points 5.3.1 , 5.4 & 5.5).

I'm talking about contriving an attack that drives PIE to 100% drop
/before/ it reaches tail drop (because 100% drop is pathological,
whereas tail drop is only mildy irritating by comparison).

> 5.8:  I think this is a quibblish change not relevant for any
> reasonable length of queue measured in packets.
>
> but I do note that we switched from packet limits to byte limits in
> cake but that was for other reasons - primarily due to the extreme
> (1000x1) dynamic range of a modern "packet".

I've think you've misunderstood (i.e. I've explained it badly).

I merely meant that, if PIE has to fall back to tail drop, at least it
should always stop enqueuing packets when there's <1MTU free at the
tail, which ensures there will be no bias against large packets.

> 6: I do wish the draft and the code I have still lined up, and the
> constants clearly defined.
>
> 7: exp(p) is not cheap, and there aint no floating point in the kernel either.

Sorry, again I've explained badly. I had defined p as a double, and I
meant "extract the integer held in the exponent of the floating point
representation".

I should have described this for integer arithmentic, as "lg(p) in
integer arithmetic".
On Intel Architecture it's the bsr instruction in assembler, which
returns the position of the highest set bit in an unsigned integer.

I've clarified this point and 5.8 above in my review. I've updated it
to issue 01A and uploaded it at the same URL:
<http://www.bobbriscoe.net/projects/latency/piervw_tr.pdf>

> 8. Haven't read the draft, can't comment on the nits.
>
> One quick note on:
>
> "4.1 Random Dropping s/Like any state-of-the-art AQM scheme, PIE would
> drop packets randomly/ /PIE drops packets randomly/ Rationale: The
> other scheme with a claim to be state-of-the-art doesnâ€™t (CoDel). I
> would agree if the draft had said â€œA state of the art scheme should
> introduce randomness into packet dropping in order to desynchronize
> flows,â€  but maybe it was decided not to introduce such underhand
> criticism of CoDel. Whatever, the draft needs to be careful about
> evangelising random drop, given it attempts to derandomize later."
>
> I don't buy the gospel that randomness is needed to avoid tcp global
> synchronization. I would prefer that whatever evangelicalization of
> randomness exists in any draft here or elsewhere be dropped in favor
> always of discussing the real problem of tcp global sync instead...
>
> ... which as near as I can tell both codel and fq_codel avoid just
> fine without inducing a random number generator. I welcome evidence to
> the contrary.

Dropping single packets at a time (like CoDel and PDPC) should prevent
global synch.
Global synch is caused if a spike in one flow can induce a sequence of
drops that hit other flows.

Theoretically, there could be a problem with dropping in a
deterministic pattern as CoDel does (a flow could game it by avoiding
sending around the times it knows drops will occur). But it's a lot
easier to just send faster by ignoring drop signals anyway.

CoDel's determinism might become problematic if unresponsiveness to
congestion became a serious problem, so that it became necessary to
police congestion responsiveness.

IOW, when you're putting algorithms in a network that could be there
for decades, don't not use randomness lightly. Randomness has nice
security properties.

Bob

________________________________________________________________
Bob Briscoe,                                                  BT

-- 
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67