[PATCH RFC 1/3] Broadcast qdisc statistics via netlink on packet dequeue.

Toke Høiland-Jørgensen toke at toke.dk
Tue Jun 11 00:15:29 PDT 2013


Dave Taht <dave.taht at gmail.com> writes:

> It is highly unlikely these patches would be accepted by upstream as
> allocating a skb in the fast path is probably a no-no. However at
> least part of them seem generically useful - instrumenting pfifo_fast
> in particular - and certainly helpful for debugging in general.

Right, well, I never envisioned it was something people would turn on if
they wanted to maximise performance.

> I had hoped it would be possible to do this without mucking with
> locking, tho. :(

Well I'm not really doing any locking as it is; the whole thing is
called from a point where the qdisc lock is already held. The only
"mucking" I'm doing is making the gnet_stats stuff optionally *not* lock
anything. Not so sure that is a good idea, but it seemed better than
unlocking everything before calling into gnet_stats*. Of course an
alternative would be to just not use that, but thought reusing the
existing code would be worthwhile.

> In my own case I'd like to have a qdisc_drop_with_stats
> (skb,sch,"fq_codel",reason) that could be compiled in or out easily,
> sending the packet headers of the dropped packet to a listener much as
> you do here. This would be useful for monitoring, have potential as a
> routing metric, and make it easier to analyze the behavior of various
> qdisc types, without having to poll and lock via tc.

Yeah, that should be quite doable starting from what I have, I think.
What would "reason" be?

> Email syntax for a RFC PATCH is RFC PATCH, not PATCH RFC

Noted.

> unnecessary change. try to touch as little as possible... There is an
> emacs mode for linux indentation, btw which helps avoid annoyingly
> messing up existing code.

Yeah, punted on cleaning up the whitespace changes; will go over those
again.

> It's not clear to me where this stuff enters userspace or not, and in
> general it's safer to append new enums/defines rather than insert
> them.

It's a number used for assigning the netlink multicast group; userspace
includes the header file and subscribes to the same group number. Put it
there to group related stuff together...

> So this acted up on a SMP kernel?

Not really; as noted above, gen_stats by default locks the qdisc before
accessing it, but since I'm now calling it from somewhere where it's
already locked, I had to get it to not do that.

I'm not entirely sure it's exactly the same lock, but think so...

> QDISC_BROADCAST_DEQUEUE_STATS maybe? 

Oh yeah, completely forgot to fix up the compile option after adding the
sysctl...

> Elsewhere you've defined the sysctl. Would argue for sysfs...

Eh?

> Is there a way to also merely detect if there is a listener present?

Dunno. I assume that since the netlink stuff has multicast semantics it
would just do something reasonable (such as discard the packet) if
there's no listeners. I can see how detecting it would cut down on some
overhead though; I'll see if I can find something. My alternative was to
abort if the sysctl parameter is set to some magic value (0? negative
number?).

> Useful for debugging, but invasive for general use, far more
> heavyweight than tossing it into netlink.

Yeah, that wasn't supposed to stay in there.

> NLMSG_SPACE seems arbitrary, and perhaps this needs to be GFP_ATOMIC?
> (But I am totally not an expert down here)

Right, will investigate.

> Am thinking this could be a callback in the Qdisc_ops struct...

Well right now it's the other way around; this is called for every
qdisc, and dumps the general qdisc stats along with anything that the
qdisc itself defines (through the dump_stats callback in qdisc_ops).
It's the same way the stuff that tc talks to works, and makes explicit
support from each qdisc optional.


Also, another unresolved issue with this patch: if there's more than one
qdisc on an interface (e.g. a tree of HTBs), it will probably be called
for each one dequeueing a packet; not sure what to do about that; maybe
just limit it to the root qdisc?

-Toke

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 489 bytes
Desc: not available
URL: <https://lists.bufferbloat.net/pipermail/bloat-devel/attachments/20130611/cbb42af3/attachment.pgp>


More information about the Bloat-devel mailing list