Welcome back to this week’s edition of Beyond the Bootcamp!
Today, we’ll take a look at how garbage collection can still result in memory leaks.
Recall in last week’s edition that we learned that garbage collection is the process in which the programming language tracks what pieces of memory a program is referencing. Once the program is no longer making reference to a particular piece of memory, the language’s garbage collector frees that piece of memory so that the program can reuse that memory address if necessary. While this sounds like a nice scheme, how can this go wrong?
Let’s illustrate with an example
Imagine we’re tasked with implementing a Stack class backed by an array. Below is a barebones implementation of a Stack class in Java:
In the above snippet, the Stack class exposes two operations, push and pop. Push adds an item on the top of the stack while pop removes the item from the top of the stack.
Now let’s assume we’re running piece of code on a long running server and we start to observe the server is continuing to see a steady increase in memory. What could be the issue?
Let’s dig a little deeper
Let’s take a look at the internals of the Stack class to see if we can spot any problems. The push method is simple. It first checks to see if the array can hold another element on the stack. If it can’t, then it will resize itself by allocating a new array and copying all the elements from the original array to the new array.
After doing so, it will assign arr[size] to the element pushed onto the stack then increment size by 1 for the next element. Simple enough right?
As for the pop operation, we’ll look back 1 index (since size is now pointing toward where a new element should be placed) to fetch the top element. We’ll then decrement the size of the stack and return the element.
Here’s a pictorial representation of pop:
In the above picture, we initially have 5 elements in the stack. After calling stack.pop(), we decrement the size of the stack by 1 so there are 4 elements on the stack. Remember, whenever we call pop, we return the value the index before size. The current element that size is pointing to is not part of the stack.
As we continue to call stack.pop(), are there issues with this approach?
Remember how garbage collection works
Remember that the way the garbage collection process works is that it figures out what addresses of memory have no more references to it and then frees that piece of memory. Simply storing references to the Object elements in the array indicates to the garbage collector that the program is still using that piece of memory even if it has no use for it.
Thus, when the garbage collector runs to free up memory, it never frees up the elements that are stored on the array that aren’t used anymore which leads to a memory leak.
How do we fix the leak?
To stay in accordance with how the garbage collector works, after popping an element off of the stack, we can simply set the value in the array back to null so that we’re not storing a reference:
And voila! Now every time we pop an element off of the stack, we’ll clear the reference to the object in the array so that the garbage collector can correctly reclaim memory to be reused throughout the lifecycle of the program.
So… if it’s not a leak what happens when we run out memory?
As I mentioned last week, if a process runs out of memory, it will start chunks of memory to disk. Most programming languages set a memory limitation depending on the version and operating system. Typically, for a language like Java, it’s set to 2–4 gigabytes and it will throw an OutOfMemoryException if it’s exceeded.
However, the number of gigabytes that a process allocates is not always how much memory the hardware has. If a computer has 8 gigabytes of memory but the programmer tells Java to allocate 16 gigabytes of memory, there could be a case in which the Java process exceeds the amount of physical memory available. In this case, the process starts writing chunks of memory to disk.
What does that look like?
Below is an example where I spun up a Java process on my Mac, allocated a large array of objects to an array, then made a copy of a subset of that array to drop the memory usage. Notice that there’s 8 gigabytes of physical memory available but there’s some swap space being used, 889.0 MB to be exact.
If you’re using a Mac, you can find this information on Activity Monitor.
Can we turn off swapping?
While we technically could turn of swapping, the program would crash once we ran out of physical memory. If the swap files are deleted, the program would also crash!
Wrapping up the module
In this module, we explored the basics of memory. In particular, we took a look at how the operating system makes use of layers of caching to speed up program execution while leveraging both spatial and temporal locality, limitations of memory and where it’s stored, and how programming languages help us manage memory for us along with avoiding common pitfalls.
In our next module, we’ll be taking a look at how we can make programs run blazing fast by leveraging multiple cores on a computer to allow for simultaneous execution within a single process. We’ll be exploring all the nuances and issues that come with concurrency within a program and how we can avoid them to get the most out of computer hardware as possible.
Until next time!