Introduction
When parsing data from a file, you might have some variables whose values doesn’t span the entire region of a byte. For example, you could have a sequence of flags all packed into a stream. There might be an unsigned integer that might just be 10 bits long. Let’s see how we can parse bits from stream.
I’ve been working on learning about h.264 codec and the data is very tightly packed when trying to decode. There are unsigned integers that are defined to occupy as few as 3 bits. Well, it’s an involved learning process and we come out just slightly wiser than yesterday going through it.
To read the byte stream as bits, we first calculate the number of bits in the stream (numBytes * 8) and use a variable to keep track of current position in the bit stream. Let’s name this variable as “m_bitPos“.
To work with bit stream, we need to be able to retrieve the value of each of the bits in the byte. This helper function gives us a mask for the bit index requested, which we can apply on the bit stream to retrieve it’s value.
const char myByte = 'c'; inline uint8_t getMaskForBit(uint8_t bit) { return 1 << (7 - bit); } Usage: uint8_t res = myByte & getMaskForBit(0..7); //This masks the respective bit requested and retrieves the value of the corresponding bit of the byte in our buffer.
Parsing IT!
To parse a bit, we need to know which byte it belongs to. We can find the byte our current bit belongs to simply dividing current bit position by 8:
currentByte = m_bitPos >> 3;
Here, I’m assuming the most significant bit always appears first in out bit stream. The process to retrieve a value by reading bits is as follows:
– Loop over the requested number of bits
– Get the byte that represents current bit position.
– Get the mask for current bit position. The purpose of this mask to check whether the corresponding bit is set in current byte.
– Left shift the current result to make room for appending the result of our test.
– Perform AND op on the byte to check if the corresponding bit is set. If it’s set, we add 1 to the result.
As we are doing a left shift on every iteration of the loop, all the values fall in place by the time loop terminates and the value stored in the bits is neatly embedded into uint32_t.
uint32_t readBits(uint8_t n) { uint32_t res = 0; while(n != 0) { uint32_t currentByte = (uint32_t)(bitPos >> 3); uint8_t mask = getMaskForBit(bitPos % 8); res <<= 1; res += (buffer[currentByte] & mask) ? 1 : 0; --n; ++bitPos; } return res; }
Wrapping it up
This one’s a quickie and I feel like it’s a good tool to have at your disposal, so I decided to write about it. As always, hope this helped someone! 🙂