On the tools, iperf 2.0.14 is going through a lot of development. My hope is to have the code done soon so it can be tested internally at Broadcom. We're testing with WiFi , to 100G NICs and thousands of parallel threads. I've been able to find time for this refactoring per COVID-19 stay at home work.

What I think the industry should move to is measuring both throughput and latency in a direct manner. 2.0.14 also supports full duplex traffic (as well as --reverse) TCP server output shows the following (these are 10G NICs)

[rjmcmahon@localhost iperf2-code]$ src/iperf -s -i 1
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 4] local 192.168.1.10%enp2s0 port 5001 connected with 192.168.1.80 port 47420 (trip-times) (MSS=1448) (peer 2.0.14-alpha)
[ ID] Interval Transfer Bandwidth Reads Dist(bin=16.0K) Burst Latency avg/min/max/stdev (cnt/size) inP NetPwr
[ 4] 0.00-1.00 sec 1.09 GBytes 9.34 Gbits/sec 18733 2469:2552:2753:2456:2230:2272:1859:2142 2.988/ 0.971/ 3.668/ 0.370 ms (8908/131072) 3.34 MByte 390759.84
[ 4] 1.00-2.00 sec 1.10 GBytes 9.42 Gbits/sec 19844 2690:2984:3211:2858:2255:2039:1893:1914 3.000/ 2.320/ 3.704/ 0.346 ms (8979/131073) 3.37 MByte 392263.52
[ 4] 2.00-3.00 sec 1.10 GBytes 9.41 Gbits/sec 18897 2458:2668:2764:2412:2216:2300:2019:2060 3.003/ 2.310/ 3.665/ 0.347 ms (8978/131070) 3.37 MByte 391878.92
[ 4] 3.00-4.00 sec 1.10 GBytes 9.42 Gbits/sec 18389 2339:2542:2443:2268:2211:2232:2144:2210 3.009/ 2.315/ 3.659/ 0.347 ms (8979/131073) 3.38 MByte 391101.00
[ 4] 4.00-5.00 sec 1.10 GBytes 9.41 Gbits/sec 19468 2588:2889:3017:2623:2250:2221:1947:1933 2.971/ 2.259/ 3.671/ 0.364 ms (8979/131069) 3.33 MByte 396075.85
[ 4] 5.00-6.00 sec 1.10 GBytes 9.41 Gbits/sec 18547 2357:2596:2582:2344:2170:2192:2104:2202 2.971/ 2.276/ 3.699/ 0.365 ms (8978/131072) 3.34 MByte 396149.20
[ 4] 6.00-7.00 sec 1.10 GBytes 9.42 Gbits/sec 18479 2363:2598:2430:2332:2234:2184:2155:2183 2.976/ 2.279/ 3.667/ 0.363 ms (8978/131084) 3.34 MByte 395486.89
[ 4] 7.00-8.00 sec 1.10 GBytes 9.42 Gbits/sec 18506 2387:2549:2519:2339:2229:2183:2060:2240 2.971/ 2.266/ 3.667/ 0.365 ms (8979/131071) 3.33 MByte 396155.84
[ 4] 8.00-9.00 sec 1.10 GBytes 9.41 Gbits/sec 18732 2398:2640:2750:2352:2113:2286:2030:2163 2.973/ 2.271/ 3.691/ 0.364 ms (8979/131059) 3.34 MByte 395780.90
[ 4] 9.00-10.00 sec 1.10 GBytes 9.41 Gbits/sec 19585 2659:2901:3073:2619:2285:2221:1854:1973 2.976/ 2.264/ 3.666/ 0.361 ms (8978/131081) 3.34 MByte 395467.57
[ 4] 10.00-10.00 sec 3.17 MBytes 9.51 Gbits/sec 51 0:6:20:0:0:19:6:0 3.112/ 2.410/ 3.609/ 0.406 ms (26/127692) 2.92 MByte 381912.79
[ 4] 0.00-10.00 sec 11.0 GBytes 9.41 Gbits/sec 189231 24708:26925:27562:24603:22193:22149:20071:21020 2.983/ 0.971/ 3.704/ 0.360 ms (89741/131072) 3.35 MByte 394144.05

Some bidir output looks like:

[rjmcmahon@localhost iperf2-code]$ src/iperf -c 192.168.1.10 --trip-times --bidir
------------------------------------------------------------
Client connecting to 192.168.1.10, TCP port 5001 with pid 4322 (1 flows)
Write buffer size: 128 KByte
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.1.80%enp2s0 port 47928 connected with 192.168.1.10 port 5001 (bidir) (trip-times) (MSS=1448) (ct=0.37 ms)
[ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT NetPwr
[ 3] 0.00-10.00 sec 10.9 GBytes 9.35 Gbits/sec 89183/0 0 3021K/2079 us 562251.48
[ ID] Interval Transfer Bandwidth Reads Dist(bin=16.0K) Burst Latency avg/min/max/stdev (cnt/size) inP NetPwr
[ 3] 0.00-10.00 sec 10.9 GBytes 9.39 Gbits/sec 174319 21097:23110:24661:21619:18723:17600:13153:34356 2.664/ 1.045/ 6.521/ 0.235 ms (89550/131072) 2.98 MByte 440455.93
[ ID] Interval Transfer Bandwidth
[FD3] 0.00-10.00 sec 21.8 GBytes 18.7 Gbits/sec

Man page notes:

NOTES
Numeric options: Some numeric options support format characters per '<value>c' (e.g. 10M) where the c format characters are k,m,g,K,M,G. Lowercase format characters are 10^3 based and uppercase are 2^n based, e.g. 1k = 1000, 1K = 1024, 1m =
1,000,000 and 1M = 1,048,576

Rate limiting: The -b option supports read and write rate limiting at the application level. The -b option on the client also supports variable offered loads through the <mean>,<standard deviation> format, e.g. -b 100m,10m. The distribution used
is log normal. Similar for the isochronous option. The -b on the server rate limits the reads. Socket based pacing is also supported using the --fq-rate long option. This will work with the --reverse and --bidir options as well.

Synchronized clocks: The --trip-times option indicates that the client's and server's clocks are synchronized to a common reference. Network Time Protocol (NTP) or Precision Time Protocol (PTP) are commonly used for this. The reference clock(s)
error and the synchronization protocols will affect the accuracy of any end to end latency measurements.

Binding is done at the logical level (ip address or layer 3) using the -B option and at the device (or layer 2) level using the percent (%) separator for both the client and tne server. On the client, the -B option affects the bind(2) system call,
and will set the source ip address and the source port, e.g. iperf -c <host> -B 192.168.100.2:6002. This controls the packet's source values but not routing. These can be confusing in that a route or device lookup may not be that of the device
with the configured source IP. So, for example, if the IP address of eth0 is used for -B and the routing table for the destination IP address resolves the output interface to be eth1, then the host will send the packet out device eth1 while using
the source IP address of eth0 in the packet. To affect the physical output interface (e.g. dual homed systems) either use -c <host>%<dev> (requires root) which bypasses this host route table lookup, or configure policy routing per each -B source
address and set the output interface appropriately in the policy routes. On the server or receive, only packets destined to -B IP address will be received. It's also useful for multicast. For example, iperf -s -B 224.0.0.1%eth0 will only accept ip
multicast packets with dest ip 224.0.0.1 that are received on the eth0 interface, while iperf -s -B 224.0.0.1 will receive those packets on any interface, Finally, the device specifier is required for v6 link-local, e.g. -c [v6addr]%<dev> -V, to
select the output interface.

Reverse and bidirectional traffic: The --reverse (-R) and --bidir options can be confusing when compared to the legacy options of -r and -d. It's suggested to use --reverse if you want to test through a NAT firewall (or -R on non-windows systems).
This applies role reversal of the test after opening the full duplex socket. The latter two of -d and -r remain supported for legacy support and compatibility reasons. These open new sockets in the opposite direction vs treat the originating
socket as full duplex. Firewall piercing is typically required to use -d and -r if a NAT gateway is in the path. That's part of the reason it's highly encouraged to use the newer --reverse and --bidir and deprecate the use of the -r and -d options.

Also, the --reverse -b <rate> setting behaves differently for TCP and UDP. For TCP it will rate limit the read side, i.e. the iperf client (role reversed to act as a server) reading from the full duplex socket. This will in turn flow control the
reverse traffic per standard TCP congestion control. The --reverse -b <rate> will be applied on transmit (i.e. the server role reversed to act as a client) for UDP since there is no flow control with UDP. There is no option to directly rate limit
the writes with TCP testing when using --reverse.

TCP Connect times: The TCP connect time (or three way handshake) can be seen on the iperf client when the -e (--enhancedreports) option is set. Look for the ct=<value> in the connected message, e.g.in '[ 3] local 192.168.1.4 port 48736 connected
with 192.168.1.1 port 5001 (ct=1.84 ms)' shows the 3WHS took 1.84 milliseconds.

Little's Law in queueing theory is a theorem that determines the average number of items (L) in a stationary queuing system based on the average waiting time (W) of an item within a system and the average number of items arriving at the system per
unit of time (lambda). Mathematically, it's L = lambda * W. As used here, the units are bytes. The arrival rate is taken from the writes.

Network power: The network power (NetPwr) metric is experimental. It's a convenience function defined as throughput/delay. For TCP transmits, the delay is the sampled RTT times. For TCP receives, the delay is the write to read latency. For UDP
the delay is the end/end latency. Don't confuse this with the physics definition of power (delta energy/delta time) but more of a measure of a desirable property divided by an undesirable property. Also note, one must use -i interval with TCP to
get this as that's what sets the RTT sampling rate. The metric is scaled to assist with human readability.

Fast Sampling: Use ./configure --enable-fastsampling and then compile from source to enable four digit (e.g. 1.0000) precision in reports' timestamps. Useful for sub-millisecond sampling.

Bob