[Bloat] Hardware upticks
Jonathan Morton
chromatix99 at gmail.com
Wed Jan 6 01:18:19 EST 2016
> On 6 Jan, 2016, at 02:22, Steinar H. Gunderson <sgunderson at bigfoot.com> wrote:
>
> On Tue, Jan 05, 2016 at 04:06:03PM -0800, Stephen Hemminger wrote:
>> The expensive part is often having to save and restore all the state in
>> registers and other bits on context switch.
>
> Are you sure? There's not really all that much state to save, and all I've
> been taught before says the opposite.
>
> Also, I've never ever seen the actual context switch turn up high in a perf
> profile. Is this because of some sampling artifact?
ARM has dedicated register banks for several interrupt levels for exactly this reason. Simple interrupt handlers can operate in these without spilling *any* userspace registers. This gives ARM quite good interrupt latency, especially in the simpler implementations.
That doesn’t help for an actual context switch of course. What does help is “lazy FPU state switching”, where on a context switch the FPU is simply marked as unavailable. Only if/when the process attempts to *use* the FPU, this gets trapped and the trap handler restores the correct state before returning an enabled FPU to userspace. The same goes for SIMD register banks, of course.
Lazy context switching is a kernel feature. It’s used on all architectures that have a runtime disable-able FPU, AFAIK. For a context switch to kernel and back to the same process, the FPU & SIMD are never actually switched, so there is almost no overhead.
- Jonathan Morton
More information about the Bloat
mailing list