[Bloat] GSO (was: Please enter issues into the issue tracker - Issue system organisation needed)

Fri Feb 25 10:48:57 EST 2011

On Fri, 2011-02-25 at 12:54 +0100, Eric Dumazet wrote:
> Le vendredi 25 février 2011 à 12:21 +0100, Jesper Dangaard Brouer a
> écrit :
> > On Thu, 2011-02-24 at 20:29 +0100, Eric Dumazet wrote:
> > > - Its important to set TSO off (ethtool -K eth0 tso off), or else we
> > > send big packets (up to 64Kbytes) and this used to break SFQ fairness.
> > > This can really hurt latencies of interactive flows.
> > 
> > Don't you mean "GSO" Generic-Segmentation-Offload (ethtool -K eth0 gso
> > off) as this happens in the stack.  While TSO Tcp-Segmentation-Offload
> > happens in hardware, and you will not see it in the SFQ qdisc?
> > 
> 
> I definitly see big packets if TSO is enabled, for localy generated
> trafic. (You probably are concerned by routers, where all trafic is
> forwarded, so TSO is not used, even if enabled)

Yes, as you know I'm very converned about the router case.  Guess that
explains my experience with TSO.

> > I recommend that both is turned off, on small bandwidth links where
> > latency matters.
> > 
> 
> Sure.
> 
> > I'm wondering if LRO (Large-Receive-Offload) affect you, when you are
> > using SFQ on ingress?
> > 
> > 
> 
> GRO/LRO can have an impact, for sure. But most 'current' kernels dont
> have GRO/LRO by default. I mean, kernels in use by 2-3 years old
> distros.

Hmm, are you sure?
The speed test server runs Debian Lenny and kernel 2.6.26-2-686, and had
GSO enabled...

> > Recently had some "funny" issues with GRO, where a 100 Mbit/s customer
> > could "only" get approx 90 Mbit/s throughput to our speed test server
> > (other customers, in another appartment building could get approx 96
> > Mbit/s). The issue was resolved by disabling GSO on the speed test
> > server.  The theory is that some switch on the path cannot handle the
> > bursts generated by GSO, which is max 64K (I think, correct me if I'm
> > wrong).

Just looked at the case, the throughput was only average 83 Mbit/s, with
spikes. See:
http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/speed-to-grantoften-1.png

> 
> Thats right. One 64K packet with standard MTU means some spikes on wire,
> but if your switches cant resist to this... Is TCP SACK active on the
> customer side (and speed test server) ?

Yes, both servers (/proc/sys/net/ipv4/tcp_sack = 1).

I think that the bufferbloat theory is that SACKs will not work, due to
the long delays introduced by buffers(bloat).   In this case, you can
see on the graph, a max RTT around 150 ms and an average of 20 ms.  

While another, more well behaved path in the network to the speed
server, I would only see a max RTT around 25 ms and an average of 15 ms,
see:
http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/speed-to-pc314a-1.png

You can also see this path had an ave of 90Mbit/s, but with significant
throughput drops (the 92Mbit/s line is an artificial line on the graph).
This behavior is probaly caused by the GSO effect.

Disabling GSO on speed server fixed the problem as can be seen on graph:
http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/speed-to-grantoften-solved.png

The really strange part when troubleshooting this issue was that the
throughput as fine between the two customer end-boxes ("grantoften" and
"pc314a") as can be see here:
http://people.netfilter.org/hawk/dropbox/bloat_vs_GSO/pc314a-to-grantoften-1.png

> > When adjusting buffer sizes, its important to take this bursty TCP
> > behavior into account, which is created by both GSO and TSO.  I'm not
> > saying that the queue size needs to be above 64K.  For smaller links, it
> > might make sense to set it, significantly below 64K, to avoid a GSO
> > enabled Linux machine to ramp up its window size, which makes it capable
> > of bursting.
> > 
> 
> TSO basically hurts SFQ or other AQM, unless you use big/fast pipes.
> 
> For a router workload anyway, I would say its better to not try to
> coalesce frames in software level, just handle them one by one.

Yes, but we still want (at least RX) NAPI/polling-mode, where we process
all the packets in the NIC hardware queue at once.

See you around,
-- 
Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network Kernel Developer
  Cand. Scient Datalog / MSc.CS
  Author of http://adsl-optimizer.dk
  LinkedIn: http://www.linkedin.com/in/brouer