Bitsets & FREE software consulting announcement!

Announcements

Before we get into the meat of the content today, if you want to connect on a 1:1 consultation session about making the switch into software, resume reviews, or anything related to software engineering, I’m starting to open up 30 minute slots on Calendly free of charge! Click here to sign up!

Hash tables

Before we talk about what a bitset is, let’s take a step back to make sure we’re on the same page for hash tables. Recall that a hash table is an efficient data structure that establishes a mapping between two different types.

So what’s a bitset?

At a high level, a bitset is an extremely space efficient hash set for positive integers. It exploits the structure of how bits are stored so we can easily ask questions to tell us whether or not a given integer is in a set.

What would we use a bitset for?

To give us some underlying motivation as to when we would use a bitset, let’s take an actual interview question that I was asked.

Initial iteration

This seems like a good use case for a hash set!

Problems with this approach

While this is a simple and straightforward approach, one drawback is that as this restaurant expands and grows, the number of tickets we need to keep track of in the set grows linearly as well.

Space issues

One of the issues that we run into when we want to store a large amount of integers in memory is the fact that it might take up too much space. Say we want to store 10 million integers in memory. We need to make sure that the hardware that we’re running on can actually store that many numbers!

Bitset to the rescue!

One way we can drastically reduce the amount of space is to leverage using a bitset. A bitset internally is represented with an array of bytes. Recall that a byte has 8 bits inside of it. Each bit in an individual byte will represent whether or not that number is in the bitset, 0 if it’s not in the set, 1 if it is.

How do we determine an open ticket?

One downside of this approach is the fact that it’s more difficult for us to determine whether or not we have any open tickets to give. We would need to iterate over the bitset everytime we want to find an open ticket and do some bitwise manipulations to figure out whether or not a particular byte has any bits. Thankfully, the bitwise operations are very fast so it shouldn’t take long to iterate over all of the indexes.

What are our space savings?

This is where the bit set really shines. Each byte can store 8 bits which means we can easily keep track of 8 numbers with a single byte. So with our new calculations:

Using a bitset in code

If you want to make use of bitsets in your code, rather than implementing your own, most programming languages generally already have built in classes so that you don’t need to interface with low level bits and bitwise operators. Some languages like Java even have it included in their standard library. Amazing!

What’s next?

This article is getting a little long, so we’ll go ahead and wrap up the module next week! We’ll go over some actual concrete code making use of bit sets as well as several exercises to help solidify your understanding. Afterwards, we’ll be moving onto our next module, computer memory!

CS @ UW • SDE @ Amazon • http://www.justinharjanto.com