[Bloat] BBR implementations, knobs to turn?

erik.taraldsen at telenor.com erik.taraldsen at telenor.com
Tue Nov 17 05:05:24 EST 2020


Thank you for the response Neal


old_hw # uname -r
5.3.0-64-generic
(Ubuntu 19.10 on xenon workstation, integrated network card, 1Gbit GPON access.  Used as proof of concept from the lab at work)



new_hw # uname -r
4.18.0-193.19.1.el8_2.x86_64
(Centos 8.2 on xenon rack server, discrete 10Gbit network card, 40Gbit server farm link (low utilization on link), intended as fully supported and run service.  Not possible to have newer kernel and still get service agreement in my organization)


Both servers have very similar network latency/distance without load, probably a hop or two in the new servers favor if we are picky.  But extremely small difference compared to the radio link latency.




Retransmits

Hopefully the attached graphs gets through the mailinglist. I'm comparing the output from iperf's statistics.  I have aggregated the throughput graphs a bit to make it a bit clearer that the old (bridgemaster) gets higher throughput than the new (speedmonster).  Throughput, retransmit, roundtrip and congestion window size is all reported from iperf.


The test client is currently a raspberry pi 4 with connected directly to the 5G outdoor fixed mounted antenna.  The data shared here is from a 4G access.  We hope/plan to migrate the tests into the main CPU of the 5G antenna.




ss output

See attachments.  Had them running in while loops so you can follow the status over time if interesting.  Note there are two tcp sessions on the same port.  One for controll plane and one for performance data.


Old server bridgemaster

     bbr wscale:7,8 rto:240 rtt:37.681/1.474 ato:40 mss:1448 pmtu:1500 rcvmss:536 advmss:1448 cwnd:162 ssthresh:130 bytes_sent:485440552 bytes_retrans:35392016 bytes_acked:449935592 bytes_received:37 segs_out:335250 segs_in:75226 data_segs_out:335249 data_segs_in:1 bbr:(bw:25394432bps,mrtt:23.917,pacing_gain:1,cwnd_gain:2) send 49802500bps lastrcv:139736 pacing_rate 25140488bps delivery_rate 25140096bps delivered:310746 busy:139732ms unacked:78 retrans:0/24442 dsack_dups:16 reordering:185 reord_seen:4 rcv_space:14600 rcv_ssthresh:64076 notsent:2873304 minrtt:23.905


New server speedmonster

     bbr wscale:7,7 rto:233 rtt:32.506/1.951 ato:40 mss:1448 pmtu:1500 rcvmss:536 advmss:1448 cwnd:118 ssthresh:142 bytes_sent:466022872 bytes_retrans:40005344 bytes_acked:425898792 bytes_received:37 segs_out:321840 segs_in:72884 data_segs_out:321839 data_segs_in:1 bbr:(bw:29397720bps,mrtt:22,pacing_gain:1,cwnd_gain:2) send 42051067bps lastsnd:1 lastrcv:139363 lastack:8 pacing_rate 30453440bps delivery_rate 23350056bps delivered:294130 busy:139361ms unacked:82 retrans:0/27628 reordering:7 reord_seen:4 rcv_space:28960 rcv_ssthresh:28960 notsent:3021752 minrtt:22




PCAP

The pcap files were aprox 500M each from the 140s run.  I can redo with shorter tests if you have a suggestion for what you would be sufficient time to see/study behavior.  Both captures are from server side, as I messed up the capture on the client side.  I linked to the xplot files for each run.


https://drive.google.com/file/d/1ZsHXkfSNcdILdy_Q3RuYbQvCKLyO6wAm/view?usp=sharing





Regards Erik


________________________________
Fra: Neal Cardwell <ncardwell at google.com>
Sendt: 16. november 2020 22:14
Til: Taraldsen Erik
Kopi: bloat
Emne: Re: [Bloat] BBR implementations, knobs to turn?

A couple questions:

- I guess this is Linux TCP BBRv1 ("bbr" module)? What's the OS distribution and exact kernel version ("uname -r")?

- What do you mean when you say "The old server allows for more re-transmits"?

- If BBRv1 is suffering throughput problems due to high retransmit rates, then usually the retransmit rate is around 15% or higher. If the retransmit rate is that high on a radio link that is being tested, then that radio link may be having issues that should be investigated separately?

- Would you be able to take a tcpdump trace of the well-behaved and problematic traffic and share the pcap or a plot?
 https://github.com/google/bbr/blob/master/Documentation/bbr-faq.md#how-can-i-visualize-the-behavior-of-linux-tcp-bbr-connections

- Would you be able to share the output of "ss -tin" from a recently built "ss" binary, near the end of a long-lived test flow, for the well-behaved and problematic cases?
  https://github.com/google/bbr/blob/master/Documentation/bbr-faq.md#how-can-i-monitor-linux-tcp-bbr-connections

best,
neal



On Mon, Nov 16, 2020 at 10:25 AM <erik.taraldsen at telenor.com<mailto:erik.taraldsen at telenor.com>> wrote:

I'm in the process of replacing a throughput test server.  The old server is running a 1Gbit Ethernet card on a 1Gbit link and ubuntu.  The new a 10Gbit card on a 40Gbit link and centos.  Both have low load and Xenon processors.


The purpose is for field installers to verify the bandwidth sold to the customers using known clients against known servers.  (4G and 5G fixed installations mainly).


What I'm finding is that the new server is consistently delivering slightly lower throughput than the old server.  The old server allows for more re-transmits and has a slightly higher congestion window than the new server.


Is there any way to tune bbr to allow for more re-transmits (which seems to be the limiting factor)?  Or other suggestions?



(Frankly I think the old server is to aggressive for general purpose use.  It seems to starve out other tcp sessions more than the new server.  So for delivering regular content to users the new implementation seems more balanced, but that is not the target here.  We want to stress test the radio link.)


Regards Erik

_______________________________________________
Bloat mailing list
Bloat at lists.bufferbloat.net<mailto:Bloat at lists.bufferbloat.net>
https://lists.bufferbloat.net/listinfo/bloat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20201117/54198337/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ubuntu vs centos bbr implementations.png
Type: image/png
Size: 136268 bytes
Desc: ubuntu vs centos bbr implementations.png
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20201117/54198337/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ubuntu vs centos bbr throughput.png
Type: image/png
Size: 38707 bytes
Desc: ubuntu vs centos bbr throughput.png
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20201117/54198337/attachment-0004.png>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ss_speedmonster_output.txt
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20201117/54198337/attachment-0002.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ss_ouput_bridgemaster.txt
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20201117/54198337/attachment-0003.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ubuntu vs centos xplot.png
Type: image/png
Size: 126863 bytes
Desc: ubuntu vs centos xplot.png
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20201117/54198337/attachment-0005.png>


More information about the Bloat mailing list