A tiny almost sorta kinda nearly minimal perfect hash for a mac classifier?

Jonathan Morton chromatix99 at gmail.com
Tue Nov 15 16:44:25 EST 2011


On 15 Nov, 2011, at 11:14 pm, Sean Conner wrote:

> It was thus said that the Great Dave Taht once stated:
>> 
>> Here's one. I'd like to classify network streams so that they end up
>> in a unique queue[x], based on their nexthop mac address in the ip
>> header.
> 
>  More on this.  Now that I'm at the office, I can scan a larger selection
> of MAC addresses.  Out of the 38 machines currently on the segment, the
> lower twelve bits of each MAC addresses are unique (this excluding broadcast
> and multicast MAC addresses).  
> 
>> Does anything elegant exist? Anything worthy in the kernel?
> 
>  For non-broadcast/multicast MAC addresses, take the last N bits.

For 16 bits, the birthday theorem predicts a 50% chance of at least one collision when there are 256 hosts.  Dave wants to scale up to 2048 hosts.  So consider the following:

uint32_t upper = (mac >> 16) & MASK;
uint16_t hash = 0xFFFF & (mac ^ ((upper << X)|(upper >> (32-X))) ^ ((upper << Y)|(upper >> (32-Y))));

When a collision is detected on a new MAC, X and Y can be searched in the 0..31 space for a combination with minimal collisions (probably zero).  When X and Y are equal, that is equivalent to taking the bottom 16 bits.  The rotate operations are usually compiled to single instructions on modern CPUs and halfway competent compilers, and the available unique combinations of X and Y are normally over 490, giving a fairly good defence against the birthday theorem.

Of course, some of the 16-bit subsets of the MAC probably have low entropy, especially if an organisation has bought a lot of network hardware from the same vendor, but hopefully that doesn't matter in practice.

 - Jonathan Morton




More information about the Bloat-devel mailing list