From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.tohojo.dk (mail.tohojo.dk [188.40.53.186]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id 68B8121F129 for ; Tue, 11 Jun 2013 05:11:33 -0700 (PDT) Received: from alrua-laptop.borgediget.toke.dk (wifi-eduroam-161133.inria.fr [128.93.161.133]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.tohojo.dk (Postfix) with ESMTPSA id 2496F1EC1EEA; Tue, 11 Jun 2013 14:11:30 +0200 (CEST) Received: by alrua-laptop.borgediget.toke.dk (Postfix, from userid 1000) id E720FC6A; Tue, 11 Jun 2013 09:15:31 +0200 (CEST) From: =?utf-8?Q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: Dave Taht Subject: Re: [PATCH RFC 1/3] Broadcast qdisc statistics via netlink on packet dequeue. References: <1370867989-7318-1-git-send-email-toke@toke.dk> <1370867989-7318-2-git-send-email-toke@toke.dk> Date: Tue, 11 Jun 2013 09:15:29 +0200 In-Reply-To: (Dave Taht's message of "Mon, 10 Jun 2013 10:20:57 -0700") Message-ID: <87r4g9dsoe.fsf@toke.dk> User-Agent: Gnus/5.130008 (Ma Gnus v0.8) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Cc: bloat-devel@lists.bufferbloat.net X-BeenThere: bloat-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Developers working on AQM, device drivers, and networking stacks" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jun 2013 12:11:33 -0000 --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Dave Taht writes: > It is highly unlikely these patches would be accepted by upstream as > allocating a skb in the fast path is probably a no-no. However at > least part of them seem generically useful - instrumenting pfifo_fast > in particular - and certainly helpful for debugging in general. Right, well, I never envisioned it was something people would turn on if they wanted to maximise performance. > I had hoped it would be possible to do this without mucking with > locking, tho. :( Well I'm not really doing any locking as it is; the whole thing is called from a point where the qdisc lock is already held. The only "mucking" I'm doing is making the gnet_stats stuff optionally *not* lock anything. Not so sure that is a good idea, but it seemed better than unlocking everything before calling into gnet_stats*. Of course an alternative would be to just not use that, but thought reusing the existing code would be worthwhile. > In my own case I'd like to have a qdisc_drop_with_stats > (skb,sch,"fq_codel",reason) that could be compiled in or out easily, > sending the packet headers of the dropped packet to a listener much as > you do here. This would be useful for monitoring, have potential as a > routing metric, and make it easier to analyze the behavior of various > qdisc types, without having to poll and lock via tc. Yeah, that should be quite doable starting from what I have, I think. What would "reason" be? > Email syntax for a RFC PATCH is RFC PATCH, not PATCH RFC Noted. > unnecessary change. try to touch as little as possible... There is an > emacs mode for linux indentation, btw which helps avoid annoyingly > messing up existing code. Yeah, punted on cleaning up the whitespace changes; will go over those again. > It's not clear to me where this stuff enters userspace or not, and in > general it's safer to append new enums/defines rather than insert > them. It's a number used for assigning the netlink multicast group; userspace includes the header file and subscribes to the same group number. Put it there to group related stuff together... > So this acted up on a SMP kernel? Not really; as noted above, gen_stats by default locks the qdisc before accessing it, but since I'm now calling it from somewhere where it's already locked, I had to get it to not do that. I'm not entirely sure it's exactly the same lock, but think so... > QDISC_BROADCAST_DEQUEUE_STATS maybe?=20 Oh yeah, completely forgot to fix up the compile option after adding the sysctl... > Elsewhere you've defined the sysctl. Would argue for sysfs... Eh? > Is there a way to also merely detect if there is a listener present? Dunno. I assume that since the netlink stuff has multicast semantics it would just do something reasonable (such as discard the packet) if there's no listeners. I can see how detecting it would cut down on some overhead though; I'll see if I can find something. My alternative was to abort if the sysctl parameter is set to some magic value (0? negative number?). > Useful for debugging, but invasive for general use, far more > heavyweight than tossing it into netlink. Yeah, that wasn't supposed to stay in there. > NLMSG_SPACE seems arbitrary, and perhaps this needs to be GFP_ATOMIC? > (But I am totally not an expert down here) Right, will investigate. > Am thinking this could be a callback in the Qdisc_ops struct... Well right now it's the other way around; this is called for every qdisc, and dumps the general qdisc stats along with anything that the qdisc itself defines (through the dump_stats callback in qdisc_ops). It's the same way the stuff that tc talks to works, and makes explicit support from each qdisc optional. Also, another unresolved issue with this patch: if there's more than one qdisc on an interface (e.g. a tree of HTBs), it will probably be called for each one dequeueing a packet; not sure what to do about that; maybe just limit it to the root qdisc? =2DToke --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (GNU/Linux) iQEcBAEBCAAGBQJRts6RAAoJEENeEGz1+utP0BwH/3CeA+Uy9DgWu09vIWR6PcLe gYu9PSQXR9L+oEDR7F2yRmsDoxf5SPvmjfQ0qqq8f3HOW3ENAabMw63c42NHHYpq LE4Qf6GOxxwJmXQk4+6Fk4KVYw2EMNrnwscQFXUn7GIz55ENtSnejyCgzv1wI1CP Xb0fawB9UZ0dtUVIyDGZcn09tcpUsWf3+2XBW937BEwokXOZQH6xxNkSeVUpaJtr vJ6uNp1KDacqL3x4gq/4yVsj5vunjfT8RhGzIVHIhbSJp0EEb+iSs3KrIJ8+7tEY 2PgaQ6zjMOL4jVjiIX+KYGoRu1DFIVYXEqNJ3psP8cRtZtHOw3YHRdSKBLv7/pw= =Lseo -----END PGP SIGNATURE----- --=-=-=--