Beyond the Bootcamp: Threads, Parallelism, and Concurrency

Welcome to a brand new module of Beyond the Bootcamp! This is probably one of my favorite topics as it has a lot of different direct applications to speed up applications efficiently by leveraging computer hardware. Let’s get started!


  • What threads are
  • How we can use more than 1 thread to take advantage of computer hardware
  • Common issues from multi-threaded applications, race conditions and deadlocks
  • What does it mean for code to execute concurrently compared to in parallel?
  • Synchronous vs asynchronous code execution

Don’t worry if you’re not familiar with a lot of this terminology, we’ll break it down in detail!

There’s lots of talk about threads here. What’s a thread?

In most programs, we make the assumption that a single line of code is being executed at a given time. Today we’re going to change that!

A thread is defined as a sequence of instructions to be executed on the CPU. Code that you write is running on at least a single thread.

Digging a little deeper into the hardware, a CPU is short for the central processing unit which consists of one or more cores. Each core runs a given thread. The operating system decides which threads get run when. We won’t be talking about how exactly the operating system does that in this module. For now, just know that’s how code gets scheduled to run.

Why make use of more cores?

A concrete example

For example, if I pass in the number 3 into the method, I would expect to get a boolean array of length 3 with the contents [false, false, true] since 0 and 1 are not prime numbers but 2 is. Here’s a code snippet demonstrating that:

Running this code, we find that it finishes executing in around 5 seconds. It turns out for very large numbers, isPrime takes a long time to complete even with the square root optimization. Let’s go ahead and speed this up!

Creating worker threads

For example, say we want to compute the array for all the primes until 100. If we had 4 threads, we could have each of the threads be responsible for 25 of the numbers. Thread 1 can be responsible for numbers 0–24, thread 2 can compute 25–50, and so on and so forth. Afterwards, we can have all the threads wait until the other threads are done to get our final result. Let’s go ahead and code that up!

Notice here we created a new class to represent a single instance of a thread and extended the Thread class. We then create 4 instances of the thread and pass in two numbers to indicate the range of numbers we want the thread to compute. We also pass in a reference to the array so the thread can modify the array to store the results of computing the prime number.

After creating the threads, we call .start() on each of them to start work. Afterward, we need to call .join() on each of them because we need to ensure that all the threads finish before we return the array of results. Otherwise, the program will finish before the threads finish.

This model of execution is more formally known as the fork-join model of execution. We’re given a big task that is split up into many smaller tasks which are delegated to each of the individual threads and then joined together to ensure that execution of each thread finishes.

Pictorial view of the fork-join model

Running this version of the code we see that the time taken goes from 5 seconds to around 2 seconds. That’s a 2.5x speed up! In this case, it’s not a perfect 4x speed up since some of the larger numbers take much longer to compute compared to the smaller ones.

Wrapping up

Until next time!

CS @ UW • SDE @ Amazon •