From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.toke.dk (mail.toke.dk [IPv6:2a0c:4d80:42:2001::664]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id D128E3B2A4 for ; Mon, 17 Oct 2022 10:13:28 -0400 (EDT) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=toke.dk; s=20161023; t=1666016007; bh=ENrnEY2B0+agCpyr+gehBNSBpp0tVahlf9jmy13uVPU=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=NqivPIaTGcDJCsRG8Xrchz7bbdrYAhvzbzJQbVkCL9uX91BVoSXnw97e0DLJBFO/U KWbSZud2zyU5CTKHRSKvP4Y9rq/1WyRqXwA9Vrb5OYQPwG7rKNxSi2563v1JoG5CIi CZRti9HzwFz9aTcJaI50pulK9Rzeq030IiDqfinj8umvCXr6vpFuFIlO/tbyKONQ16 n5cL8evAeKAQFZTJoaWTelM823WbRVfwADIHV2cFE2pBzGG4PRYWEE+ae3/AUahkyn I1UQSykblIFPkF9WP86F9TsNlGHcH22pHgLPviQRQzUZNQRpBLRvGTHLuY/9hskZ3/ GLIv7MAY1zYvA== To: Herbert Wolverson , libreqos@lists.bufferbloat.net Cc: Simon Sundberg In-Reply-To: References: Date: Mon, 17 Oct 2022 16:13:26 +0200 X-Clacks-Overhead: GNU Terry Pratchett Message-ID: <87bkqatu61.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [LibreQoS] In BPF pping - so far X-BeenThere: libreqos@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Oct 2022 14:13:29 -0000 [ Adding Simon to Cc ] Herbert Wolverson via LibreQoS writes: > Hey, > > I've had some pretty good success with merging xdp-pping ( > https://github.com/xdp-project/bpf-examples/blob/master/pping/pping.h ) > into xdp-cpumap-tc ( https://github.com/xdp-project/xdp-cpumap-tc ). > > I ported over most of the xdp-pping code, and then changed the entry point > and packet parsing code to make use of the work already done in > xdp-cpumap-tc (it's already parsed a big chunk of the packet, no need to do > it twice). Then I switched the maps to per-cpu maps, and had to pin them - > otherwise the two tc instances don't properly share data. Right now, output > is just stubbed - I've still got to port the perfmap output code. Instead, > I'm dumping a bunch of extra data to the kernel debug pipe, so I can see > roughly what the output would look like. > > With debug enabled and just logging I'm now getting about 4.9 Gbits/sec on > single-stream iperf between two VMs (with a shaper VM in the middle). :-) Just FYI, that "just logging" is probably the biggest source of overhead, then. What Simon found was that sending the data from kernel to userspace is one of the most expensive bits of epping, at least when the number of data points goes up (which is does as additional flows are added). > So my question: how would you prefer to receive this data? I'll have to > write a daemon that provides userspace control (periodic cleanup as well as > reading the performance stream), so the world's kinda our oyster. I can > stick to Kathie's original format (and dump it to a named pipe, perhaps?), > a condensed format that only shows what you want to use, an efficient > binary format if you feel like parsing that... It would be great if we could combine efforts a bit here so we don't fork the codebase more than we have to. I.e., if "upstream" epping and whatever daemon you end up writing can agree on data format etc that would be fantastic! Added Simon to Cc to facilitate this :) Briefly what I've discussed before with Simon was to have the ability to aggregate the metrics in the kernel (WiP PR [0]) and have a userspace utility periodically pull them out. What we discussed was doing this using an LPM map (which is not in that PR yet). The idea would be that userspace would populate the LPM map with the keys (prefixes) they wanted statistics for (in LibreQOS context that could be one key per customer, for instance). Epping would then do a map lookup into the LPM, and if it gets a match it would update the statistics in that map entry (keeping a histogram of latency values seen, basically). Simon's PR below uses this technique where userspace will "reset" the histogram every time it loads it by swapping out two different map entries when it does a read; this allows you to control the sampling rate from userspace, and you'll just get the data since the last time you polled. I was thinking that if we all can agree on the map format, then your polling daemon could be one userspace "client" for that, and the epping binary itself could be another; but we could keep compatibility between the two, so we don't duplicate effort. Similarly, refactoring of the epping code itself so it can be plugged into the cpumap-tc code would be a good goal... -Toke [0] https://github.com/xdp-project/bpf-examples/pull/59