Bits & Bytes: More Bitsets & Recap

Justin Harjanto
5 min readMar 15, 2021

Announcements

Welcome to the first edition of Bits & Bytes on Medium! As I’ve stated in the email, I’m going to be migrating previous editions of the newsletter to this platform so it’s easier to search for editions you might have missed :).

What are we looking at today?

In our last edition of Beyond the Bootcamp, we took a look at bit sets and learned how we could leverage them to save on a nearly 97% reduction of space. Today, to solidify our understanding, we’re going to take a look at code samples to see how much space we’re using to solve problems. We’ll then wrap up this edition with a recap and takeaways from this module!

A recap — The ticketing scenario

Recall the scenario we explored in the last edition about creating a ticketing system to host large virtual concerts. We said we wanted to have a hard cap of 10 million people concurrently viewing the concert at a given time. Each ticket was represented by an integer, so we would have 10 million integers in memory.

To estimate the amount of bytes we’re using, we can simply multiply the size of an integer by 10 million:

10 million integers * sizeof(int)

10 million integers * 4 bytes = 40 million bytes = 40 megabytes

Here’s a code snippet of what that would look like:

And here’s how we could use this in a server:

So we estimate it’ll take around 40 MB of memory. Let’s do some memory profiling to prove that estimate is real!

(As an aside, this will set us up quite nicely for our next module about computer memory!)

Let’s invoke our InitialTicketingSystem class in isolation with a driver class and measure the memory footprint as so:

And the result:

So… what happened here!? Why is our footprint so big? That’s nearly 14 times our original estimate of a 40 MB footprint!

Well, unfortunately it’s much more complicated than that. Without diving too deep in the weeds of things, when we put integers into the internal HashSet to keep track of our ticket ids, we need to wrap the primitive integer into a Java Integer object. This happens automatically and is necessary because Java HashSets only work with objects and not primitives. In Java, we refer to these as wrapper classes allowing us to primitive types with classes that take objects.

These wrapper classes often take much more space. In the case of the Integer type, it actually takes a whopping 16 bytes compared to an int’s 4 bytes. That’s 4 times the amount of space for an int!

Moreover, remember how hash tables represent their data? It turns out whenever we add in a new Integer into the HashSet, it internally creates a new node in the underlying HashMap which also ends up taking a big amount space.

And as we discussed last week, we can make use of bit sets to reduce our space footprint!

Rewriting this with BitSets!

It turns out we don’t need to change too much with our implementation since there’s an already existing BitSet class. Let’s get to it!

Not too much that changed right? Let’s go ahead and see what memory usage looks like now!

1.250368 MB of space?!?! Is this even real? That’s 0.000022% of the total MB taken of our implementation with a HashSet! That matches our initial calculations from last week’s calculation:

10,000,000 million integers / 8 bits in a byte = 1,250,000 bytes

Which is equal to 1.25 megabytes.

Amazing right? Here’s some more code to convince you that the ticketing system works as expected:

And here’s the output:

As you can see, leveraging the structure of how bits are laid out can lead to some very high space savings for specific applications!

Wrapping up Bits & Bytes

In this module, we took a look at the limits of finite computer representation. By understanding how the computer represents data internally, we can be more knowledgable as software engineers to catch common edge cases such as integer overflow & underflow and mathematical computation errors with floats & doubles. We garnered a deeper understanding of some of the modern problems that have resulted as a result of the limitations of data types through looking at epoch time and rockets exploding. We also explored solutions to these problems with classes such as BigInteger so we’re better equipped to handle these cases when we’re faced with these issues.

Always be careful about overflow & underflow!

In the second half of the module, we took a look at how to manipulate bytes using bitwise operations. We went through several problems including a common interview problem to get a feel for how to utilize the newly introduce operators.

This set the stage for taking a look and understanding how a bit set would work. We illuminated a use case for when we’d want to use a bit set and today we dove deep into looking at how much a bit set could save us hundreds of megabytes of space.

In the coming weeks, we’ll take a deeper look at how computers store data internally in what we refer to as computer memory. We’ll learn how we can leverage how computers store data to our advantage to ensure that we’re writing the most performant code.

I hope you’ve enjoyed this series so far! If you have any feedback, feel free to leave a comment or if you’re adventurous enough, come join the Discord!

https://discord.com/invite/XbWjPWxdGV
https://discord.com/invite/XbWjPWxdGV

As always, if you have any questions about making the transition to becoming a software engineer or just want to talk to me 1:1, here’s the Calendly invite to book a free session!

--

--