Saturday, July 28, 2007

Re: MIT HACKMEM BITCOUNT

This is a reply to the MIT HACKMEM article that I posted earlier.
I mentioned I did not realize what they were doing in the previous article, but recently just stumbled on what they are actually doing. They are actually doing the same thing I was aiming for, however in a sneaky way.

They are actually performing the hamming algorithm using subtraction. On most architectures, subtraction and addition are close within clock cycle ranges. Though I probably should test the two solutions.

To get an overall sense of what they are doing, let's first examine 3-bit numbers. This will make it easier to extend to a 32-bit number.

Say the 3 lower bits are all 1's, therefore the number is actually 7. But we want to find the count in the 3-bit number, since all are set, the answer should be 3.

There's a technique that you can use to count the number of bits.
x - x>>1 - x>>2 as mentioned in the previous article.
So that's all we have to do:

x = 111 (value of 7 in decimal)
x>>1 = 011 (value of 3 in decimal)
x>>2 = 001 (value of 1 in decimal)
so therefore x - x>>1 - x>>2 = 7 - 3 - 1 = 3, which is the correct number of bits set.

To extend this to a 32-bit number, you have to notice one small thing.
When I did a shift over x>>1 and x>>2, there was a zero that was added.
You need to get the same effect without shifting over the bit from the 32-bit number, meaning something like this:

say x = 101 111 (in binary split into 3-bit segments)
x>>1 = 010 111
x>>2 = 001 010
x - x>>1 - x>>2 is not going to give us the effect we wanted, so we have to mask out the bits that are set that would delude us from getting the correct version.

Therefore, from the previous version the bits that were zero (0) when we used 3-bit numbers follows:
x (no zeroed bits)
x>>1 (the top (3rd bit) was zeroed out)
x>>2 (the top 2 (3rd bit and 2nd bit) was zeroed out)

Therefore, to get the 32-bit number that we want we need a correct mask, which would be to mask out the top bit for every 3-bit chunk for x>>1 and the top 2 bits for every 3-bit chunk for x>>2.
Suprisingly, if we use the octal numbering system, this is quite easy.
(The octal numbering system for zeroing out the top bit is just 03 and to zeroing out the top 2 bits is just 01;
This is because there's only 3 bits in an octal number:
111 = 7
011 = 3
001 = 1

Therefore you can think of the algorithm as actually doing something like:
x = (x&037777777777) - ((x>>1)&033333333333) - ((x>>2)&011111111111)

which is equivalent to x - ((x>>1)&033333333333) - ((x>>2)&011111111111)

And then it hammers down how many bits are set in each 3-bit chunk, and so we just do a 3-right bit shift addition and mod by 63 to get the final answer.

Here is some code in C,

#define y 0x1111111111111111ull
#define n 0x0f0f0f0f0f0f0f0full

#define MIT(x) \
( \
((HAMMING(x) + (HAMMING(x)>>4)) & (n) ) % 255 \
)
#define HAMMING(x) \
( \
( (x)&(y)) + \
(((x)>>1)&(y)) + \
(((x)>>2)&(y)) + \
(((x)>>3)&(y)) \
)

The code MIT(x) will yield the solution where x is a 64-bit number. I had to change the implementation a little for 32-bit numbers and 64-bit numbers. For 64-bit numbers, you need to hold the sums in 4-bit chunks instead, and therefore mod'ing by 63 is no longer correct, but 255 is. And the mask has also changed accordingly. (This was my way of im plementing it from the last post, but the code looks nearly equivalent:

x = x - ((x>>1)&0x7777777777777777) -
((x>>2)&0x3333333333333333) -
((x>>3)&0x1111111111111111);
x = ((x+(x>>4))&0x0f0f0f0f0f0f0f0f) % 255;

q.e.d.

No comments: