I can't speak to these chips or drivers but we have the ability to fault isolate latency via much more advanced network telemetry.
This type of testing seems likely to be conflating buffer bloat with latency. They're two different phenomena.
iperf 2.1.0 has a --near-congestion option to help test latency rather than bloat. It's not perfect but is reproducible for controlled networks.
Also, probably a good idea to synchronize clocks. NTP stratum-1 servers can be built with a raspberry pi and a GPS hat. One can yse PTP to distribute.
The average crystal ends up being about as accurate as a mechanical watch