From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.toke.dk (mail.toke.dk [52.28.52.200]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id AFC623B2A4 for ; Sun, 15 Jul 2018 06:09:45 -0400 (EDT) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=toke.dk; s=20161023; t=1531649384; bh=rWnD2YcgXS7l2iWCntZti35oYT1uo70AGL9omQw5bns=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=UhsaWODX68+lDmTLSsfx8R8bXdFH+XATNY6d1X1jtRKGQ4DkBgLMkp/TNN4mfwlcq 1rT1ion+P3wTl1MTxiESnETM0ndk3RLIT4AymnybepGnj1ca9jnbKmYojTnVPRLNdl I9Xa8+ZdQJjubi07v/hPW4X+M9rFl6Jl7dyIsFOWufnGQHL5vBtUYEFGDaP747N64N Jfgg7V1CJ/Bla/afiOG4hvTBuT2BHX9iWBrhKNLl3W6ocV9qU9zWtuPnAsqNvSw81+ GuvE0vBD7TErkbR1HHZD+XxpukF1QH8nb9gGbz5ZobN9ioKoYkW758kpYFuYtdo4VX 4qJwW5U/NyxeA== To: Jonathan Morton , dag dg Cc: Cake@lists.bufferbloat.net In-Reply-To: <3E2FE0BD-3A6C-4399-90FB-1334A7A0D962@gmail.com> References: <3E2FE0BD-3A6C-4399-90FB-1334A7A0D962@gmail.com> Date: Sun, 15 Jul 2018 12:09:41 +0200 X-Clacks-Overhead: GNU Terry Pratchett Message-ID: <87o9f895ei.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Cake] Multiple Hardware Queues X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Jul 2018 10:09:45 -0000 Yeah, I agree that at 1 Gbit you don't need multiple receive queues to get to line rate. In my 100Gbit tests, I got to 50 Gbps with CAKE (I should really post some graphs of that), so at really high speeds we would benefit from being able to run simultaneously on multiple CPUs. But let's just say that turning CAKE into something that can run on multiple CPUs simultaneously is non-trivial... :) > In any case, the MQ qdisc simply sorts packets into hardware queues > according to the CPU they were submitted from. [...] But it's > basically useless on [...] a machine acting primarily as a router, > since the traffic is submitted from just one or two CPUs at a time, > and usually most of the CPUs are idle anyway. Not quite. On a router, the distribution of packets over CPUs will depend on what happens on the receive side. Usually, the hardware will have the same number of receive queues as transmit queues, and it will use Receive Side Scaling (RSS) which hashes packets into the queues based on the packet header. Often, the hardware queues are not assigned properly to different CPUs, which is why the first thing 10Gbit+ performance tuning guides tells you to do is to adjust the CPU mapping of the hardware queue IRQs... > I have no idea what the hardware does to coalesce those packets into a > single stream to be sent over the wire. That's hardware specific, but I think most devices do something that more or less corresponds to round-robin scheduling of the hardware queues. -Toke