Welcome back to the second edition of the Beyond the Bootcamp memory module!
Last week, we talked about how what caches were along with what spatial and temporal locality were. Let’s take a look at an example where not using spatial locality really hurts performance!
But first, a quick refresher on 2D Arrays
Recall that if we want to represent a grid, we can use an array of arrays to do so. In languages such as Java we would represent a grid as such:
Pictorially, we can represent this as so:
Summing elements in a grid
Say now we’re tasked with determining the total sum of all the elements in the grid. There are two natural ways to go about approaching this problem. We can either traverse in a horizontal or vertical fashion iterating over the rows or columns.
But does it really make a difference?
Let’s take a look:
Uhh what gives? Why is it faster to compute the sum traversing over the rows of the grid compared to the columns?
How is this related to caching?
Remember, the hardware is looking to leverage both spatial and temporal locality. The variable, ‘sum’ will be in one of the CPU registers as a result of temporal locality. This makes sense since ‘sum’ is referenced many times over the course of the loop.
Below is the same code traversing through each row just with intermediary variable names to aid in the explanation:
With regards to spatial locality, each time nums[i] is referenced, the other elements are brought along for the ride into the cache because the computer is making an educated guess that if you’ve referenced nums[i], you’ll probably reference nums[i + 1]. Let’s take a look at an example:
When we go through our first iteration of the outer loop to perform the variable assignment of row = nums, we first see that it’s not present in the cache. We then go ahead and fetch the first row from main memory to put it in the cache.
Upon modifying the ‘sum’ variable, we’ll then request for row. It’s already present in the cache! No need to go to main memory now. On the next iteration, we’ll request for row. We see that’s in our cache as well, and by doing so, we’ve saved ourselves an expensive trip to main memory.
By traversing row by row, the computer is efficiently able to leverage both temporal and spatial locality. The computer constantly refers to the sum variable and keeps it in a register to leverage temporal locality. Simultaneously, the computer makes use of spatial locality by readily pulling in elements that it thinks the program will access next.
Let’s now take a look at the column based strategy to see what went wrong…
On the first iteration, we set the for loop bounds and evaluate nums.length:
We first request for nums and initially don’t find it in the cache. As a result, we’ll go to main memory to fetch nums along with all of its elements because of spatial locality.
The next line of code we execute is sum += nums:
Nice! nums is in our cache. We then go ahead and add nums to our current sum.
On the next iteration, we execute the line:
sum += nums
Which results in the following execution:
Since we’re referencing a different row and not adjacent elements, when requesting for nums, we don’t find it in the cache. As a result, we have to go do an expensive lookup in main memory to figure out what nums resolves to. We also have to evict our entire cache and replace elements with the adjacent elements in the row nums.
On the next iteration, we then request for nums:
Once again, this results in a cache miss which means we have to go to main memory, evict the current cache, replace elements, and return nums. This is a costly operation every iteration which is why we saw such a large difference traversing the grid via rows compared to via columns! Formally, this phenomenon is referred to as cache thrashing. We should always traverse the grid row wise to leverage spatial locality when we can/
Temporal & spatial locality play a huge role when dealing with arrays and array backed data structures. By having this underlying knowledge on how the software interfaces with the hardware, we can be more aware about how to write performant code. In the next edition, we’ll explore how different languages look at memory and how we as programmers can be smart to avoid the common pitfalls associated with memory management. Until next time!