On Jul 6, 2018, at 11:29 AM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:

Pete Heist <pete@heistp.net> writes:

- is tin_deficit overflowing at these rates? at 50gbit, 2^31-1 bytes
happen in 344 ms (involuntary chuckle)
- what’s the value of tin_quantum_band here? but I suspect it’s ok.

I thought about overflows, but I don't get any "weird" values, and
everything ends up back at zero when the flows stop. And it's not
actually tin_backlog that's causing the looping…

Ok, I think tin_deficit is meant here, esp. in light of what follows regarding *_flow_count.

Once we do get past this infinite loop, which it sounds like is not caused by overflow here, I guess it’s still worth reviewing whether tin_backlog or other values _could_ overflow in certain conditions. In your case rtt is probably low, but what if it weren’t? Adding delay with netem might coax something out. In fact, I’ll see if I can add some delay to the 30-40gbit local testing that I _can_ do to see if I notice anything...

- I’m assuming sparse_flow_count + bulk_flow_count wouldn’t be 0…

Yeah, they are; that's why it keeps looping. I've been looking at both
tin_backlog and the *_flow_count vars as different ways of checking
whether the tins are actually empty... they are all 0 when this happens.

Aha, ok. It does look physically possible for these to both be 0 since there appear to be cases where one is decremented without the other being incremented. That _all_ *_flow_count vars are 0 seems strange logically. I’ll leave this alone now though as don’t yet understand what the values represent well enough… :)