[Cerowrt-devel] Baby jumbo frames support?

dpreed at reed.com dpreed at reed.com
Thu Jun 21 10:25:26 EDT 2012

I understand Dave Taht's long lecture - actually understood it years ago.  But frame aggregation is not the same thing as jumbo frames in a multi-technology Ethernet LAN.   Jumbo frames provide a way to exploit *end-to-end* frame sizes greater than 1500 bytes.  That means the source and destination TCPs get frames that are "whole" (and not random subassemblies of frames that may arrive close together in time).
9000 byte frames were invented for 1 GigE transports.   Today's 802.11n and futures approach 1 GigE, and 1 GigE is the standard wiring for most homes, etc.   It does not matter how the underlying radio links chop up the Ethernet frame, retransmit them, ack them etc.   The value I am disucssing is at the *endpoints*.
It's tempting for transport link providers to *ignore* TCP and so forth when they design their transports, and focus only on transport-level efficiencies and reliabilities.  This temptation created bufferbloat and also the excessive retry problem.   (and in the past it created the historical predecessor of "bufferbloat" - Frame Relay's "Reliable delivery mode" which would go to extraordinary lengths to never drop a packet, including storing the packets *on disk* in some cases - talk about bloated buffers!)
The conversation here (including, but not limited to Taht's comments) shows exactly that *temptation*.
Aggregation is NOT the same as large frames.  Not at all.  It achieves internal efficiencies, but not the endpoint efficiencies of receiving a coherent frame, that can be processed immediately and by a single code path.  At 1 Gigabit/sec this was important enough to introduce such frame sizes.
The alternative ways to achieve the endpoint goals would be to allow reordering of data delivery to the endpoint app, perhaps by making SCTP work instead of TCP, using a flow/congestion/rate control mechanism other than a window on sequence numbers, etc.  But that would mean changing the entire stack to a new end-to-end theory of operation.
There is a real tradeoff space, but unilaterally declaring that packet aggregation is the same as jumbo Ethernet frames is choosing a poor point in the tradeoff space.
Regarding "header overhead" - that is minor in the scheme of things.  Obsessing about that indicates a lack of perspective on the systems level issues.
-----Original Message-----
From: "Robert Bradley" <robert.bradley1 at gmail.com>
Sent: Thursday, June 21, 2012 9:33am
To: cerowrt-devel at lists.bufferbloat.net
Subject: Re: [Cerowrt-devel] Baby jumbo frames support?

On 21 June 2012 01:58, Dave Taht <dave.taht at gmail.com> wrote:
> As for PPoE with a size 1508... um... one or the other device is going
> to get in your way here. I presume that 1500 works? You would do
> better to contact the author of the driver (juhosg) to get your
> question answered as I'm under the impression he is under the right
> NDAs.

I think the point here is that MTU=1500 works, but once you add in the
PPPoE header, you end up with an effective MTU of 1492 for outbound


The short answer is that without baby-jumbo support, you either end up
fragmenting packets or you need to somehow restrict the MTU manually.
You can do that either through MSS clamping or simply configuring each
internal machine to use MTU=1492.  To get around this, the BT ADSL
modems started to support MTU=1508.  This means that the MTU within
the PPPoE tunnel remains at Ethernet-standard 1500, and avoids the
fragmentation or reconfiguration issues.

As for supporting it in CeroWRT ... the ag71xx driver defines
AG71XX_TX_MTU_LEN=1540, so it looks safe enough to use MTU 1508,
especially if you know that no vlans or other additions to the
standard header will be used.  To enable that, you need to reimplement
the eth_change_mtu function for the driver.  The current code uses the
kernel's implementation, which restricts the MTU to 1500.  An initial,
naive patch would look something like:

--- C:/Users/robert/AppData/Local/Temp/ag71x-revBASE.svn000.tmp.c	Mon
May 28 03:55:59 2012
+++ C:/Users/robert/Desktop/ag71xx/ag71xx_main.c	Thu Jun 21 13:58:44 2012
@@ -1042,13 +1042,25 @@

+ * Copied from eth_change_mtu and modified so that baby jumbo packets
+ * may be used.  This has not been tested!
+ */
+int ag71xx_change_mtu(struct net_device *dev, int new_mtu)
+        if (new_mtu < 68 || new_mtu > (ETH_DATA_LEN + 8))
+                return -EINVAL;
+        dev->mtu = new_mtu;
+        return 0;
 static const struct net_device_ops ag71xx_netdev_ops = {
 .ndo_open		= ag71xx_open,
 .ndo_stop		= ag71xx_stop,
 .ndo_start_xmit		= ag71xx_hard_start_xmit,
 .ndo_do_ioctl		= ag71xx_do_ioctl,
 .ndo_tx_timeout		= ag71xx_tx_timeout,
-	.ndo_change_mtu		= eth_change_mtu,
+	.ndo_change_mtu		= ag71xx_change_mtu,
 .ndo_set_mac_address	= eth_mac_addr,
 .ndo_validate_addr	= eth_validate_addr,


where I've copied the original function and changed the upper limit to
ETH_DATA_LEN+8, then set up the netdev_ops structure to call the new
version.  In reality, you probably want to add some better checks
(testing for MTU+all possible headers<1540?) and remove the magic
constant - in the worst case, something closer to the e1000 driver's
implementation.  I wouldn't recommend using the present version on
anything other than an experimental build, but the default MTU would
be 1500 anyway so should avoid causing too much damage.  Those on BT
ADSL lines can change the MTU on ge00 themselves and see what breaks.
Robert Bradley
Cerowrt-devel mailing list
Cerowrt-devel at lists.bufferbloat.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20120621/841a116d/attachment-0002.html>

More information about the Cerowrt-devel mailing list