<p dir="ltr">4GiB back log is several orders of magnitude more than what you want for a 100Gb link. Unless you think this algorithm will be used for 1Zb/s links. 64bit is a bit overkill</p>

<div class="gmail_quote">On Jun 12, 2016 2:04 PM, "Eric Dumazet" <<a href="mailto:eric.dumazet@gmail.com">eric.dumazet@gmail.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Sun, 2016-06-12 at 20:59 +0300, Jonathan Morton wrote:<br>

<br>

> Not so - unless you are very sure that q->backlog is the same size as<br>

> quantity.<br>

<br>

Obviously this is the case in all linux kernels so far.<br>

<br>

If someone plans to change this at one point, then yes, he/she has to<br>

fix this.<br>

<br>

>   In an increasingly 64-bit world, that is by no means assured in the<br>

> future.  I don’t like relying on wraparound behaviour without making<br>

> that assumption explicit, which is precisely what the signed types in<br>

> C are for.<br>

<br>

So why don't you send an official patch, since apparently you plan to<br>

extend a qdisc backlog beyond 4,294,967,296 bytes ?<br>

<br>

I am puzzled you have this idea, but why not, some people thought "640KB<br>

out to be enough for anybody" ...<br>

><br>

> >> IMHO the NET_XMIT_CN semantics are broken.  It might be better to<br>

> drop<br>

> >> support for it, since it should rarely be triggered.<br>

> ><br>

> > What exact part is broken ?<br>

> > Semantic looks good to me.<br>

><br>

> It’s broken in that a negative correction may be required in the first<br>

> place.<br>

<br>

But you do realize NET_XMIT_CN was there _before_ Cong Wang added<br>

backlog tracking in 2ccccf5fb43f ("net_sched: update hierarchical<br>

backlog too")<br>

<br>

The fact that Cong used a 'unsigned' instead of a 'signed' here is a<br>

very minor 'bug', since generated code is exactly the _same_.<br>

<br>

Claiming the 'semantics' are broken, while the reality is that the<br>

'implementation has a potential bug _if_ we extend to 64bit' is quite<br>

different.<br>

<br>

>   It places additional burden on every producer of the CN signal who<br>

> isn’t a tail-dropper.  I can only assume that the behaviour was<br>

> designed with only tail-drop in mind - and as we both know, that is<br>

> not the only option any more.<br>

><br>

Nothing implied that only tail drops could happen.<br>

<br>

Where have you seen this in the code exactly ?<br>

<br>

<br>

> It appears to me that there are four things than enqueue() may want to<br>

> tell its caller:<br>

><br>

> 1: That the packet was enqueued successfully (NET_XMIT_SUCCESS).<br>

><br>

> 2: That the packet was enqueued successfully, but some other relevant<br>

> packet had to be dropped due to link congestion.<br>

><br>

> 3: That the packet was *not* enqueued, and this was due to link<br>

> congestion.  This is potentially useful for tail-dropping queues,<br>

> including AQMs.<br>

><br>

> 4: That the packet was *not* enqueued, for some reason other than link<br>

> congestion; the “error case".<br>

><br>

> Currently NET_XMIT_CN appears to cover case 3, whereas Cake normally<br>

> wants cases 1 & 2 (and will only signal case 4 if a GSO aggregate<br>

> couldn’t be segmented).<br>

><br>

Not true. NET_XMIT_CN also covers 2. (not sure what 'relevant' means for<br>

you)<br>

<br>

> For the time being, I have changed my development branch of Cake to<br>

> always signal case 1 except for the error case.<br>

<br>

Feel free to change it, it is your code.<br>

<br>

But do not claim NET_XMIT_CN should be removed, just because you missed<br>

the point (being focused on router workloads I guess)<br>

<br>

It can be used by higher level protocol, even if TCP with Small Queues<br>

no longer can hit it with normal settings.<br>

<br>

We have per socket counter of dropped rx packets (sk->sk_drops), we<br>

could very easily add a counter of drops or congestions at tx.<br>

This might allow UDP applications to reduce their rate or 'fast'<br>

retransmit without waiting an extra RTT.<br>

<br>

<br>

<br>

_______________________________________________<br>

Cake mailing list<br>

<a href="mailto:Cake@lists.bufferbloat.net">Cake@lists.bufferbloat.net</a><br>

<a href="https://lists.bufferbloat.net/listinfo/cake" rel="noreferrer" target="_blank">https://lists.bufferbloat.net/listinfo/cake</a><br>

</blockquote></div>