[Cerowrt-devel] Google working on experimental 3.8 Linux kernel for Android

Jim Gettys jg at freedesktop.org
Thu Feb 28 13:02:30 PST 2013


In short, people who build hardware devices, or device drivers, don't
understand TCP.

There is a first class education failure in all this.

We have yet to find almost any device that isn't bloated; the only question
is how badly.
                                         - Jim



On Thu, Feb 28, 2013 at 3:58 PM, <dpreed at reed.com> wrote:

> At least someone actually saw what I've been seeing for years now in Metro
> area HSPA and LTE deployments.
>
>
>
> As you know, when I first reported this on the e2e list I was told it
> could not possibly be happening and that I didn't know what I was talking
> about.  No one in the phone companies was even interested in replicating my
> experiments, just dismissing them.  It was sad.
>
>
>
> However, I had the same experience on the original Honeywell 6180 dual CPU
> Multics deployment in about 1973.  One day all my benchmarks were running
> about 5 times slower every other time I ran the code.  I suggested that one
> of the CPUs was running 5x slower, and it was probably due to the CPU cache
> being turned off.   The hardware engineer on site said that that was
> *impossible*.  After 4 more hours of testing, I was sure I was right.  That
> evening, I got him to take the system down, and we hauled out an
> oscilloscope.  Sure enough, the gate that received the "cache hit" signal
> had died in one of the processors.   The machine continued to run, since
> all that caused was for memory to be fetched every time, rather than using
> the cache.
>
>
>
> Besides the value of finding the "root cause" of anomalies, the story
> points out that you really need to understand software and hardware
> sometimes.  The hardware engineer didn't understand the role of a cache,
> even though he fully understood timing margins, TTL logic, core memory
> (yes, this machine used core memory), etc.
>
>
>
> We both understood oscilloscopes, fortunately.
>
>
>
> In some ways this is like the LTE designers understanding TCP.   They
> don't.  But sometimes you need to know about both in some depth.
>
>
>
> Congratulations, Jim.  More Internet Plumbing Merit Badges for you.
>
>
>
> -----Original Message-----
> From: "Jim Gettys" <jg at freedesktop.org>
> Sent: Thursday, February 28, 2013 3:03pm
> To: "Dave Taht" <dave.taht at gmail.com>
> Cc: "David P Reed" <dpreed at reed.com>, "cerowrt-devel at lists.bufferbloat.net"
> <cerowrt-devel at lists.bufferbloat.net>
> Subject: Re: [Cerowrt-devel] Google working on experimental 3.8 Linux
> kernel for Android
>
>  I've got a bit more insight into LTE than I did in the past, courtesy of
> the last couple days.
> To begin with, LTE runs with several classes of service (the call them
> bearers).  Your VOIP traffic goes into one of them.
> And I think there is another as well that is for guaranteed bit rate
> traffic.  One transmit opportunity may have a bunch of chunks of data, and
> that data may be destined for more than one device (IIRC).  It's
> substantially different than WiFi.
> But most of what we think of as Internet stuff (web surfing, dns, etc) all
> gets dumped into a single best effort ("BE"), class.
> The BE class is definitely badly bloated; I can't say how much because I
> don't really know yet; the test my colleague ran wasn't run long enough to
> be confident it filled the buffers).  But I will say worse than most cable
> modems I've seen.  I expect this will be true to different degrees on
> different hardware.  The other traffic classes haven't been tested yet for
> bufferbloat, though I suspect they will have it too.  I was told that those
> classes have much shorter queues, and when the grow, they dump the whole
> queues (because delivering late real time traffic is useless).  But trust
> *and* verify....  Verification hasn't been done for anything but BE
> traffic, and that hasn't been quantified.
> But each device gets a "fair" shot at bandwidth in the cell (or sector of
> a cell; they run 3 radios in each cell), where fair is basically time
> based; if you are at the edge of a cell, you'll get a lot less bandwidth
> than someone near a tower; and this fairness is guaranteed by a scheduler
> than runs in the base station (called a b-nodeb, IIIRC).  So the base
> station guarantees some sort of "fairness" between devices (a place where
> Linux's wifi stack today fails utterly, since there is a single queue per
> device, rather than one per station).
> Whether there are bloat problems at the link level in LTE due to error
> correction I don't know yet; but it wouldn't surprise me; I know there was
> in 3g.  The people I talked to this morning aren't familiar with the HARQ
> layer in the system.
> The base stations are complicated beasts; they have both a linux system in
> them as well as a real time operating system based device inside  We don't
> know where the bottle neck(s) are yet.  I spent lunch upping their paranoia
> and getting them through some conceptual hurdles (e.g. multiple bottlenecks
> that may move, and the like).  They will try to get me some of the data so
> I can help them figure it out.  I don't know if the data flow goes through
> the linux system in the bnodeb or not, for example.
> Most carriers are now trying to ensure that their backhauls from the base
> station are never congested, though that is another known source of
> problems.  And then there is the lack of AQM at peering point routers....
>  You'd think they might run WRED there, but many/most do not.
> - Jim
>
>
> On Thu, Feb 28, 2013 at 2:08 PM, Dave Taht <dave.taht at gmail.com> wrote:
>
>>
>>
>>  On Thu, Feb 28, 2013 at 1:57 PM, <dpreed at reed.com> wrote:
>>
>>> Doesn't fq_codel need an estimate of link capacity?
>>>
>>  No, it just measures delay. Since so far as I know the outgoing portion
>> of LTE is not soft-rate limited, but sensitive to the actual available link
>> bandwidth, fq_codel should work pretty good (if the underlying interfaces
>> weren't horribly overbuffired) in that direction.
>> I'm looking forward to some measurements of actual buffering at the
>> device driver/device levels.
>> I don't know how inbound to the handset is managed via LTE.
>>  Still quite a few assumptions left to smash in the above.
>> ...
>> in the home router case....
>> ...
>> When there are artificial rate limits in play (in, for example, a cable
>> modem/CMTS, hooked up via gigE yet rate limiting to 24up/4mbit down), then
>> a rate limiter (tbf,htb,hfsc) needs to be applied locally to move that rate
>> limiter/queue management into the local device, se we can manage it better.
>> I'd like to be rid of the need to use htb and come up with a rate limiter
>> that could be adjusted dynamically from a daemon in userspace, probing for
>> short all bandwidth fluctuations while monitoring the load. It needent send
>> that much data very often, to come up with a stable result....
>> You've described one soft-rate sensing scheme (piggybacking on TCP), and
>> I've thought up a few others, that could feed back from a daemon some
>> samples into a a soft(er) rate limiter that would keep control of the
>> queues in the home router. I am thinking it's going to take way too long to
>> fix the CPE and far easier to fix the home router via this method, and
>> certainly it's too painful and inaccurate to merely measure the bandwidth
>> once, then set a hard rate, when
>> So far as I know the gargoyle project was experimenting with this
>> approach.
>>  A problem is in places that connect more than one device to the cable
>> modem... then you end up with those needing to communicate their perception
>> of the actual bandwidth beyond the link.
>>
>>>  Where will it get that from the 4G or 3G uplink?
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: "Maciej Soltysiak" <maciej at soltysiak.com>
>>> Sent: Thursday, February 28, 2013 1:03pm
>>> To: cerowrt-devel at lists.bufferbloat.net
>>> Subject: [Cerowrt-devel] Google working on experimental 3.8 Linux kernel
>>> for Android
>>>
>>>  Hiya,
>>> Looks like Google's experimenting with 3.8 for Android:
>>> https://android.googlesource.com/kernel/common/+/experimental/android-3.8
>>> Sounds great if this means they will utilize fq_codel, TFO, BQL, etc.
>>> Anyway my nexus 7 says it has 3.1.10 and this 3.8 will probably go to
>>> Android 5.0 so I hope Nexus 7 will get it too some day or at least 3.3+
>>> Phoronix coverage:
>>> http://www.phoronix.com/scan.php?page=news_item&px=MTMxMzc
>>> Their 3.8 changelog:
>>> https://android.googlesource.com/kernel/common/+log/experimental/android-3.8
>>> Regards,
>>> Maciej
>>>    _______________________________________________
>>> Cerowrt-devel mailing list
>>> Cerowrt-devel at lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>>
>>>
>>
>> --
>> Dave Täht
>>
>> Fixing bufferbloat with cerowrt:
>> http://www.teklibre.com/cerowrt/subscribe.html
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-devel at lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/cerowrt-devel/attachments/20130228/052c7ef9/attachment.html>


More information about the Cerowrt-devel mailing list