From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=oKsHaE=6H=freebox.fr=mbizon@srs.iliad.fr>
Received: from ns.iliad.fr (ns.iliad.fr [212.27.33.1])
 (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id E06BA3B2A4
 for <cake@lists.bufferbloat.net>; Thu, 23 Apr 2020 08:33:31 -0400 (EDT)
Received: from ns.iliad.fr (localhost [127.0.0.1])
 by ns.iliad.fr (Postfix) with ESMTP id EA25C201B8;
 Thu, 23 Apr 2020 14:33:30 +0200 (CEST)
Received: from sakura (freebox.vlq16.iliad.fr [213.36.7.13])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by ns.iliad.fr (Postfix) with ESMTPS id DCDC1201AA;
 Thu, 23 Apr 2020 14:33:30 +0200 (CEST)
Date: Thu, 23 Apr 2020 14:33:29 +0200
From: Maxime Bizon <mbizon@freebox.fr>
To: Toke =?iso-8859-1?Q?H=F8iland-J=F8rgensen?= <toke@redhat.com>
Cc: Dave Taht <dave.taht@gmail.com>, Cake List <cake@lists.bufferbloat.net>
Message-ID: <20200423123329.GG28541@sakura>
References: <d3f45591-7359-47a6-bd0b-f89f9b49c5a7@www.fastmail.com>
 <75FEC2D9-BFC8-4FA2-A972-D11A823C5528@gmail.com>
 <CAA93jw7kfSjuQP2j8uX5-+hs2PBLNZ6c0=tV=PjZE50fQ1oFLw@mail.gmail.com>
 <AAAE0AE6-BA8C-4236-A68D-C5A4CF7DFEDA@slashdirt.org>
 <603DFF79-D0C0-41BD-A2FB-E40B95A9CBB0@gmail.com>
 <CAA93jw5yvybv+aXH4NLd1r_8xSP2CPwzeh0q6wm1c26jLPBxag@mail.gmail.com>
 <CAH8sseTdDfswOYpYvaDFm_jsaMCpsT9Ebj2ZvkKwaqyyjZiWHA@mail.gmail.com>
 <CAA93jw54L6-mUU9oTgZVFHh5kY7HHGVQjZJZfUMw7Kuq+S9-8w@mail.gmail.com>
 <20200423092909.GC28541@sakura> <87o8ri76u2.fsf@toke.dk>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <87o8ri76u2.fsf@toke.dk>
User-Agent: Mutt/1.9.4 (2018-02-28)
X-Virus-Scanned: ClamAV using ClamSMTP ; ns.iliad.fr ;
 Thu Apr 23 14:33:30 2020 +0200 (CEST)
X-Mailman-Approved-At: Thu, 23 Apr 2020 08:59:01 -0400
Subject: Re: [Cake] Advantages to tightly tuning latency
X-BeenThere: cake@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Cake - FQ_codel the next generation <cake.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cake>,
 <mailto:cake-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cake>
List-Post: <mailto:cake@lists.bufferbloat.net>
List-Help: <mailto:cake-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cake>,
 <mailto:cake-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 23 Apr 2020 12:33:32 -0000


On Thursday 23 Apr 2020 à 13:57:25 (+0200), Toke Høiland-Jørgensen wrote:

Hello Toke,

> That is awesome! Please make sure you include the AQL patch for ath10k,
> it really works wonders, as Dave demonstrated:
> 
> https://lists.bufferbloat.net/pipermail/make-wifi-fast/2020-March/002721.html

Was it in 5.4 ? we try to stick to LTS kernel

> We're working on that in kernel land - ever heard of XDP? On big-iron
> servers we have no issues pushing 10s and 100s of Gbps in software
> (well, the latter only given enough cores to throw at the problem :)).
> There's not a lot of embedded platforms support as of yet, but we do
> have some people in the ARM world working on that.
> 
> Personally, I do see embedded platforms as an important (future) use
> case for XDP, though, in particular for CPEs. So I would be very
> interested in hearing details about your particular platform, and your
> DPDK solution, so we can think about what it will take to achieve the
> same with XDP. If you're interested in this, please feel free to reach
> out :)

Last time I looked at XDP, its primary use cases were "early drop" /
"anti ddos".

In our case, each packet has to be routed+NAT, we have VLAN tags, we
also have MAP-E for IPv4 traffic. So in the vanilla forwading path,
this does multiple rounds of RX/TX because of tunneling.

TBH, the hard work in our optimized forwarding code is figuring out
what modifications to apply to each packets. Now whether modifications
and tx would be done by XDP or by hand written C code in the kernel is
more of a detail, even though using XDP is much cleaner of course.

What the kernel always lacked is what DaveM called once the "grand
unified flow cache", the ability to do a single lookup and be able to
decide what to do with the packet. Instead we have the bridge
forwarding table, the ip routing table (used to be a cache), the
netfilter conntrack lookup, and multiple round of those if you do
tunneling.

Once you have this "flow table" infrastructure, it becomes easy to
offload forwarding, either to real hardware, or software (for example,
dedicate a CPU core in polling mode)

The good news is that it seems nftables is building this:

https://wiki.nftables.org/wiki-nftables/index.php/Flowtable

I'm still using iptables, but it seems that the features I was missing
like TCPMSS are now in nft also, so I will have a look.


> Setting aside the fact that those single-stream tests ought to die a
> horrible death, I do wonder if it would be feasible to do a bit of
> 'optimising for the test'? With XDP we do have the ability to steer
> packets between CPUs based on arbitrary criteria, and while it is not as
> efficient as hardware-based RSS it may be enough to achieve line rate
> for a single TCP flow?

You cannot do steering for a single TCP flow at those rates because
you will get out-of-order packets and kill TCP performance.

I do not consider those single-stream tests to be unrealistic, this is
exactly what happen if say you buy a game on Steam and download it.

-- 
Maxime