From: "Toke Høiland-Jørgensen" <toke@toke.dk>
To: Dave Taht <dave.taht@gmail.com>
Cc: bloat-devel@lists.bufferbloat.net
Subject: Re: [PATCH RFC 1/3] Broadcast qdisc statistics via netlink on packet dequeue.
Date: Tue, 11 Jun 2013 09:15:29 +0200 [thread overview]
Message-ID: <87r4g9dsoe.fsf@toke.dk> (raw)
In-Reply-To: <CAA93jw6mhOmRS6CrjM7L3B1WpN0Rk1rW_cYXFQTpaeAbTF2zcA@mail.gmail.com> (Dave Taht's message of "Mon, 10 Jun 2013 10:20:57 -0700")
[-- Attachment #1: Type: text/plain, Size: 4077 bytes --]
Dave Taht <dave.taht@gmail.com> writes:
> It is highly unlikely these patches would be accepted by upstream as
> allocating a skb in the fast path is probably a no-no. However at
> least part of them seem generically useful - instrumenting pfifo_fast
> in particular - and certainly helpful for debugging in general.
Right, well, I never envisioned it was something people would turn on if
they wanted to maximise performance.
> I had hoped it would be possible to do this without mucking with
> locking, tho. :(
Well I'm not really doing any locking as it is; the whole thing is
called from a point where the qdisc lock is already held. The only
"mucking" I'm doing is making the gnet_stats stuff optionally *not* lock
anything. Not so sure that is a good idea, but it seemed better than
unlocking everything before calling into gnet_stats*. Of course an
alternative would be to just not use that, but thought reusing the
existing code would be worthwhile.
> In my own case I'd like to have a qdisc_drop_with_stats
> (skb,sch,"fq_codel",reason) that could be compiled in or out easily,
> sending the packet headers of the dropped packet to a listener much as
> you do here. This would be useful for monitoring, have potential as a
> routing metric, and make it easier to analyze the behavior of various
> qdisc types, without having to poll and lock via tc.
Yeah, that should be quite doable starting from what I have, I think.
What would "reason" be?
> Email syntax for a RFC PATCH is RFC PATCH, not PATCH RFC
Noted.
> unnecessary change. try to touch as little as possible... There is an
> emacs mode for linux indentation, btw which helps avoid annoyingly
> messing up existing code.
Yeah, punted on cleaning up the whitespace changes; will go over those
again.
> It's not clear to me where this stuff enters userspace or not, and in
> general it's safer to append new enums/defines rather than insert
> them.
It's a number used for assigning the netlink multicast group; userspace
includes the header file and subscribes to the same group number. Put it
there to group related stuff together...
> So this acted up on a SMP kernel?
Not really; as noted above, gen_stats by default locks the qdisc before
accessing it, but since I'm now calling it from somewhere where it's
already locked, I had to get it to not do that.
I'm not entirely sure it's exactly the same lock, but think so...
> QDISC_BROADCAST_DEQUEUE_STATS maybe?
Oh yeah, completely forgot to fix up the compile option after adding the
sysctl...
> Elsewhere you've defined the sysctl. Would argue for sysfs...
Eh?
> Is there a way to also merely detect if there is a listener present?
Dunno. I assume that since the netlink stuff has multicast semantics it
would just do something reasonable (such as discard the packet) if
there's no listeners. I can see how detecting it would cut down on some
overhead though; I'll see if I can find something. My alternative was to
abort if the sysctl parameter is set to some magic value (0? negative
number?).
> Useful for debugging, but invasive for general use, far more
> heavyweight than tossing it into netlink.
Yeah, that wasn't supposed to stay in there.
> NLMSG_SPACE seems arbitrary, and perhaps this needs to be GFP_ATOMIC?
> (But I am totally not an expert down here)
Right, will investigate.
> Am thinking this could be a callback in the Qdisc_ops struct...
Well right now it's the other way around; this is called for every
qdisc, and dumps the general qdisc stats along with anything that the
qdisc itself defines (through the dump_stats callback in qdisc_ops).
It's the same way the stuff that tc talks to works, and makes explicit
support from each qdisc optional.
Also, another unresolved issue with this patch: if there's more than one
qdisc on an interface (e.g. a tree of HTBs), it will probably be called
for each one dequeueing a packet; not sure what to do about that; maybe
just limit it to the root qdisc?
-Toke
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 489 bytes --]
next prev parent reply other threads:[~2013-06-11 12:11 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-10 12:39 [PATCH RFC 0/3] Broadcasting qdisc statistics via netlink Toke Høiland-Jørgensen
2013-06-10 12:39 ` [PATCH RFC 1/3] Broadcast qdisc statistics via netlink on packet dequeue Toke Høiland-Jørgensen
2013-06-10 17:20 ` Dave Taht
2013-06-11 7:15 ` Toke Høiland-Jørgensen [this message]
2013-06-10 12:39 ` [PATCH RFC 2/3] Add qdisc_stats_broadcast_interval sysctl parameter, and use it to limit stats broadcast interval Toke Høiland-Jørgensen
2013-06-10 12:39 ` [PATCH RFC 3/3] Make pfifo_fast track qlen stats Toke Høiland-Jørgensen
2013-06-17 19:07 ` [RFC PATCH v2] Broadcast qdisc statistics via netlink on packet dequeue Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r4g9dsoe.fsf@toke.dk \
--to=toke@toke.dk \
--cc=bloat-devel@lists.bufferbloat.net \
--cc=dave.taht@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox