From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 7F2E63B29E for ; Mon, 25 Jul 2022 09:12:30 -0400 (EDT) Received: by mail-wr1-x42e.google.com with SMTP id k11so15445240wrx.5 for ; Mon, 25 Jul 2022 06:12:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=DeGVo1zcUTDdIHCnsBemIyk5Zt+sC2e6v2nkR2N/vK8=; b=M4H1NILKCKJWLzMgFSyoV78ph5osDIsPNJUD/KPenq75SO9V8Z8155NeKHy5TCTx2Q BeEi9jLcriUWAt6rmQxGXw4dAd8Ds9fUjNbIw8oq/c4A67O832uqG8MC/FOktdeD3uAI 0UhL3MJGrhT0c2zWvukdPdvFQusVYeRrD/veAAksZhpB8E4U7d6oAtj5TgZhIa3DPBHp E9OBTwhD3OpdevOsjF2gPXkshE3aFgKqebC1y0FDHZqrbniLeM4+WYCcnQaY5YD2IGBW 0QW09ydmXXnOh4Q3sOYpHGZitx/KGfmemigdsp5GJgiWWUneeVkeadIQCjriXZyUppmb YYuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=DeGVo1zcUTDdIHCnsBemIyk5Zt+sC2e6v2nkR2N/vK8=; b=S+l27UMbmlme8jPzUH5QTjbvT5u8AMY2JdcD4Sm8MCPi3yez2KKw0z3KVobaf9d7T8 GPl/53Lof7mfuNbuKJOGm0uUO7lpq2HG4qSEvRtzLUloszrQti82QIcFHZumb03QFYOI KDOuVDVk9lCKwoGyeNmHW0/Xfse1AGxchZfTTydGwfPCMJVPZCZmewe+SAhIkeNOotBq a2de+oMupbHfKEkFqLW1QJfGHNQqWPObADhcUp520auYzn2TRV78d0XkZ7zrQZHx13Rs n8T5t28hvGaV3E7ni8KmlimFQZxfMVNKfk+wGDCLtV81M+8i3goxrjxutOF9lrmt0mKX tWZw== X-Gm-Message-State: AJIora9qDJb0LXktgyY0RMX8+6EtMCfcOBM7kCo0X12kw0s6EfT9u3ME IpSj1M4fotAZW3H1QSzbVpeP5nqGS2L4kO3ujdN2NP0XFOA= X-Google-Smtp-Source: AGRyM1s6Nxi1Sc6LPAH2DuVu/i+pdeEex+Dq6gPAXy0yhsV5PxEMfw8uUo5C5lhtyOz78xuDsVfdi5n/hwXCd2dVark= X-Received: by 2002:a5d:4f8f:0:b0:21e:4f09:9e15 with SMTP id d15-20020a5d4f8f000000b0021e4f099e15mr8018651wru.55.1658754749027; Mon, 25 Jul 2022 06:12:29 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Dave Taht Date: Mon, 25 Jul 2022 06:12:17 -0700 Message-ID: To: bloat Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: [Bloat] Fwd: 400G forwarding - how does it work? X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Jul 2022 13:12:30 -0000 I'd like to understand more deeply, too. ---------- Forwarded message --------- From: James Bensley Date: Mon, Jul 25, 2022 at 5:55 AM Subject: 400G forwarding - how does it work? To: NANOG Hi All, I've been trying to understand how forwarding at 400G is possible, specifically in this example, in relation to the Broadcom J2 chips, but I don't the mystery is anything specific to them... According to the Broadcom Jericho2 BCM88690 data sheet it provides 4.8Tbps of traffic processing and supports packet forwarding at 2Bpps. According to my maths that means it requires packet sizes of 300Bs to reach line rate across all ports. The data sheet says packet sizes above 284B, so I guess this is excluding some headers like the inter-frame gap and CRC (nothing after the PHY/MAC needs to know about them if the CRC is valid)? As I interpret the data sheet, J2 should supports chassis with 12x 400Gbps ports at line rate with 284B packets then. Jericho2 can be linked to a BCM16K for expanded packet forwarding tables and lookup processing (i.e. to hold the full global routing table, in such a case, forwarding lookups are offloaded to the BCM16K). The BCM16K documentation suggests that it uses TCAM for exact matching (e.g.,for ACLs) in something called the "Database Array" (with 2M 40b entries?), and SRAM for LPM (e.g., IP lookups) in something called the "User Data Array" (with 16M 32b entries?). A BCM16K supports 16 parallel searches, which means that each of the 12x 400G ports on a Jericho2 could perform an forwarding lookup at same time. This means that the BCM16K "only" needs to perform forwarding look-ups at a linear rate of 1x 400Gbps, not 4.8Tbps, and "only" for packets larger than 284 bytes, because that is the Jericho2 line-rate Pps rate. This means that each of the 16 parallel searches in the BCM16K, they need to support a rate of 164Mpps (164,473,684) to reach 400Gbps. This is much more in the realm of feasible, but still pretty extreme... 1 second / 164473684 packets =3D 1 packet every 6.08 nanoseconds, which is within the access time of TCAM and SRAM but this needs to include some computing time too e.g. generating a key for a lookup and passing the results along the pipeline etc. The BCM16K has a clock speed of 1Ghz (1,000,000,000, cycles per second, or cycle every 1 nano second) and supports an SRAM memory access in a single clock cycle (according to the data sheet). If one cycle is required for an SRAM lookup, the BCM16K only has 5 cycles to perform other computation tasks, and the J2 chip needs to do the various header re-writes and various counter updates etc., so how is magic this happening?!? The obvious answer is that it's not magic and my understanding is fundamentally flawed, so please enlighten me. Cheers, James. --=20 FQ World Domination pending: https://blog.cerowrt.org/post/state_of_fq_code= l/ Dave T=C3=A4ht CEO, TekLibre, LLC