4GiB back log is several orders of magnitude more than what you want for a 100Gb link. Unless you think this algorithm will be used for 1Zb/s links. 64bit is a bit overkill On Jun 12, 2016 2:04 PM, "Eric Dumazet" wrote: > On Sun, 2016-06-12 at 20:59 +0300, Jonathan Morton wrote: > > > Not so - unless you are very sure that q->backlog is the same size as > > quantity. > > Obviously this is the case in all linux kernels so far. > > If someone plans to change this at one point, then yes, he/she has to > fix this. > > > In an increasingly 64-bit world, that is by no means assured in the > > future. I don’t like relying on wraparound behaviour without making > > that assumption explicit, which is precisely what the signed types in > > C are for. > > So why don't you send an official patch, since apparently you plan to > extend a qdisc backlog beyond 4,294,967,296 bytes ? > > I am puzzled you have this idea, but why not, some people thought "640KB > out to be enough for anybody" ... > > > > >> IMHO the NET_XMIT_CN semantics are broken. It might be better to > > drop > > >> support for it, since it should rarely be triggered. > > > > > > What exact part is broken ? > > > Semantic looks good to me. > > > > It’s broken in that a negative correction may be required in the first > > place. > > But you do realize NET_XMIT_CN was there _before_ Cong Wang added > backlog tracking in 2ccccf5fb43f ("net_sched: update hierarchical > backlog too") > > The fact that Cong used a 'unsigned' instead of a 'signed' here is a > very minor 'bug', since generated code is exactly the _same_. > > Claiming the 'semantics' are broken, while the reality is that the > 'implementation has a potential bug _if_ we extend to 64bit' is quite > different. > > > It places additional burden on every producer of the CN signal who > > isn’t a tail-dropper. I can only assume that the behaviour was > > designed with only tail-drop in mind - and as we both know, that is > > not the only option any more. > > > Nothing implied that only tail drops could happen. > > Where have you seen this in the code exactly ? > > > > It appears to me that there are four things than enqueue() may want to > > tell its caller: > > > > 1: That the packet was enqueued successfully (NET_XMIT_SUCCESS). > > > > 2: That the packet was enqueued successfully, but some other relevant > > packet had to be dropped due to link congestion. > > > > 3: That the packet was *not* enqueued, and this was due to link > > congestion. This is potentially useful for tail-dropping queues, > > including AQMs. > > > > 4: That the packet was *not* enqueued, for some reason other than link > > congestion; the “error case". > > > > Currently NET_XMIT_CN appears to cover case 3, whereas Cake normally > > wants cases 1 & 2 (and will only signal case 4 if a GSO aggregate > > couldn’t be segmented). > > > Not true. NET_XMIT_CN also covers 2. (not sure what 'relevant' means for > you) > > > For the time being, I have changed my development branch of Cake to > > always signal case 1 except for the error case. > > Feel free to change it, it is your code. > > But do not claim NET_XMIT_CN should be removed, just because you missed > the point (being focused on router workloads I guess) > > It can be used by higher level protocol, even if TCP with Small Queues > no longer can hit it with normal settings. > > We have per socket counter of dropped rx packets (sk->sk_drops), we > could very easily add a counter of drops or congestions at tx. > This might allow UDP applications to reduce their rate or 'fast' > retransmit without waiting an extra RTT. > > > > _______________________________________________ > Cake mailing list > Cake@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/cake >