From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.toke.dk (mail.toke.dk [IPv6:2001:470:dc45:1000::1]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 98BDC3CB35 for ; Fri, 24 Aug 2018 07:24:44 -0400 (EDT) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=toke.dk; s=20161023; t=1535109883; bh=hEr9pJ34P49IMh4YhtR0bCbL1eFJCoQiBSUtHY8X6gs=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=LzHRXymb/Leo7D+7X9XiLe0EaU8HrA12nSICrhxdG7aq0ttQSuo9jW+/mz+tFcqd3 kNyP2Yf+9CTmfvR+JCyNaXd2DvtISabSx/kPJdGRbv7DEw3y2w8P3A6nB/owvNIRQV HfH13tLwPpwjh1eBUOhzqyxjKPUc2DzUKDsU8et3A6Xpj0nPH++dTJ1sX1/URfFriu temJdlci/kI5liUvEdIv8s4Qe+czBASlptTMCgLBEMoBqYOvbnUULa+7m6JoqNmpSA s4EH4mewuy8dOGgHEpGfOt+XB52HFBvj7/IXYsdQUY0XHf9mHA3pzxZAOVi1AAwkW3 tw7EXucll4f2w== To: Mikael Abrahamsson , Dave Taht Cc: bloat In-Reply-To: References: <66e2374b-f998-b132-410e-46c9089bb06b@gmail.com> <360212B1-8411-4ED0-877A-92E59070F518@gmx.de> Date: Fri, 24 Aug 2018 13:24:42 +0200 X-Clacks-Overhead: GNU Terry Pratchett Message-ID: <87mutc6m85.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Bloat] [Cerowrt-devel] beating the drum for BQL X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Aug 2018 11:24:44 -0000 Mikael Abrahamsson writes: > On Thu, 23 Aug 2018, Dave Taht wrote: > >> I should also point out that the kinds of routing latency numbers in >> those blog entries was on very high end intel hardware. It would be >> good to re-run those sort of tests on the armada and others for >> 1,10,100, 1000 routes. Clever complicated algorithms have a tendency >> to bloat icache and cost more than they are worth, fairly often, on >> hardware that typically has 32k i/d caches, and a small L2. > > My testing has been on OpenWrt with 4.14 on intel x86-64. Looking how the > box behaves, I'd say it's limited by context switching / interrupt load, > and not actually by CPU being busy doing "hard work". > > All of the fast routing implementations (snabbswitch, FD.IO/VPP etc) > they take away CPU and devices from Linux, and runs busy-loop with > polling a lot of the time, an never context switching which means L1 > cache is never churned. This is how they become fast. I see potential > to do "XDP offload" of forwarding here, basically doing similar job to > what a hardware packet accelerator does. Yup, that would help; we see basically 2-3x improvement in routing performance with XDP over the regular stack. Don't think there's XDP support in any of the low-end ethernet drivers yet, though... -Toke