From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mail.toke.dk; dkim=pass header.d=toke.dk; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=toke.dk policy.dmarc=reject From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=toke.dk; s=20161023; t=1759223084; bh=SUydRdSd9LWDtRyYpNhm4t0gUhgE2gn1dZVXaLs9ncg=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=oqaDxhTO2C73jL0X4N2Sjts1YMn50qbkx8hKlUgwtYFmngvITwYC/MmknkwgyPjBq 9Mh7gdSmfequ4oS2c9ef+nCOld5qsog7NZ1AjjXmjxkjC+Zg4xUJeA1sPzWmNgfuBV QIpw+CQ3hFswqHR5kJhjc/uzTvv1112npkLT4MmAWlFJ6gfmZkr7NFxJhUmvWkVwPY qAteUnQUcZU2FTXTOm1RwquYfscR/8WNhEr4Q60A0yycer/0oHofP3ZkfJhePIz6KC Ui0y0Di3crNcjVRIPqGFobQy3Z5TCtW21KbwqdFI17sfuncrB0KFYsfXaul/VJ9rsk 5d3Ag6LrIlBoQ== To: David Lang , Sebastian Moeller Cc: cake@lists.bufferbloat.net In-Reply-To: References: <06DAA0CB-70EF-4FDD-BD3D-16A4FC28AD12@gmx.de> Date: Tue, 30 Sep 2025 11:04:43 +0200 X-Clacks-Overhead: GNU Terry Pratchett Message-ID: <87zfacmhuc.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain Message-ID-Hash: LZRRF7XXBZGGY7MBC6UCNCD5LA2IEOWU X-Message-ID-Hash: LZRRF7XXBZGGY7MBC6UCNCD5LA2IEOWU X-MailFrom: toke@toke.dk X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list Subject: [Cake] Re: help request for cake on a large network List-Id: Cake - FQ_codel the next generation Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: David Lang writes: > Sebastian Moeller wrote: > >> Hi David, >> >> while I have no real answer for your questions (due to never having had that kind of load in my home network ;) ) I would like to ask you to make take scripted captures of tc -s qdisc for the wan interface is reasonable short intervals (say every 10 minutes?) as that might be just what we need to actually answer your question. > > I will do that, however the network is only up under load for 4 days a year, so > it's a slow feedback loop :-) > > I would welcome any other suggestions for data to gather. Having queue statistics at a scale as granular as you can manage would be cool. It's around ~400 bytes of raw data per sample Capturing that every 100ms for four days is only around 1.4 GB of data; should theoretically be manageable? :) Note that the 400 bytes is the in-kernel binary representation; the output of `tc -s` is somewhat more; using JSON output (`tc -j -s`) and compressing the output may get within something that server-grade hardware should handle just fine. >>> On 28. Sep 2025, at 13:06, David Lang wrote: >>> >>> I'm starting to prepare for the next Scale conference and we are switching from Juniper routers to Linux routers. This gives me the ability to implement cake. >>> >>> One problem we have is classes that tell everyone 'go download this' that trigger hundreds of people to hammer the network at the same time (this is both a wifi and a network bandwidth issue, wifi is being worked on) >> > >> So one issue might be that with several 100 users the default compile-time >> size of queues (1024, IIRC) that cake will entertain might be too little, even >> in light of the 8 way assoziative hashing design. I believe this can be >> changed (within limits) only by modifying at source and recompilation of the >> kernel, if that should be needed at all. > > custom compiling a kernel is very much an option (and this sort of tweaking is > the sort of thing I'm expecting to need to do) > > The conference is in March, so we have some time to think about this and > customize things, just no chance to test before the show. > >> I wonder whether multi-queue cake would not solve this to some degree, as I >> assume each queue's instance would bring its own independent set of 1024 bins? > > good thought While I certainly wouldn't mind having a large-scale test of the multi-queue variant of cake, I don't really think it's necessary at 1G. Assuming you're using server/desktop-grade hardware for the gateways, CAKE should scale just fine to 1Gbit. Sebastian is right that the MQ variant will install independent CAKE instances on each hardware queue, which will give you more flow queues. However, the round-robin dequeueing among those queues will also be completely independent, so you won't get fairness among them either (only between the flows that share a HWQ). As for collision probability, we actually have a calculation of this in the original CAKE paper[0], in figure 1. With set-associative hashing, collision probability only start to rise around 500 simultaneous flows. And bear in mind that these need to be active flows *from the PoV of the router*. I.e., they need to all be actively transmitting data at the same time; even with lots of users with active connections as seen from the endpoint, the number of active flows in the router should be way smaller (there's a paper discussing this that I can't find right now). Having some data about this would be interesting, of course (and should be part of the tc statistics). -Toke [0] https://arxiv.org/pdf/1804.07617