Beyond the Bootcamp: Threads, Parallelism, and Concurrency

Justin Harjanto
5 min readMay 24, 2021

Welcome to a brand new module of Beyond the Bootcamp! This is probably one of my favorite topics as it has a lot of different direct applications to speed up applications efficiently by leveraging computer hardware. Let’s get started!

Agenda

As a 10,000 foot overview, we’ll be covering:

  • What threads are
  • How we can use more than 1 thread to take advantage of computer hardware
  • Common issues from multi-threaded applications, race conditions and deadlocks
  • What does it mean for code to execute concurrently compared to in parallel?
  • Synchronous vs asynchronous code execution

Don’t worry if you’re not familiar with a lot of this terminology, we’ll break it down in detail!

There’s lots of talk about threads here. What’s a thread?

In most programs, we make the assumption that a single line of code is being executed at a given time. Today we’re going to change that!

A thread is defined as a sequence of instructions to be executed on the CPU. Code that you write is running on at least a single thread.

Digging a little deeper into the hardware, a CPU is short for the central processing unit which consists of one or more cores. Each core runs a given thread. The operating system decides which threads get run when. We won’t be talking about how exactly the operating system does that in this module. For now, just know that’s how code gets scheduled to run.

Why make use of more cores?

Depending on the type of application you’re building, a single-threaded application could suffice. However, there are situations in where a given problem might be too big for a single thread to handle. Perhaps you’re processing a large stream of data that requires you to perform some complicated long running calculation. In cases like these, we might want to consider making use of multiple cores. Time is expensive and we want answers and operations done faster!

A concrete example

Say we’re tasked to make a program that will take in a number and will return a boolean array where a boolean in each index of the array denotes whether or not that index is a prime number up until the number passed in.

For example, if I pass in the number 3 into the method, I would expect to get a boolean array of length 3 with the contents [false, false, true] since 0 and 1 are not prime numbers but 2 is. Here’s a code snippet demonstrating that:

Running this code, we find that it finishes executing in around 5 seconds. It turns out for very large numbers, isPrime takes a long time to complete even with the square root optimization. Let’s go ahead and speed this up!

Creating worker threads

Before we create worker threads to help speed up the program, we need to identify what needs to be sped up. Since we know that isPrime takes a long time to compute, we can partition the domain of numbers that need to be computed into equal chunks so each thread is only responsible for a subset of the numbers to be computed.

For example, say we want to compute the array for all the primes until 100. If we had 4 threads, we could have each of the threads be responsible for 25 of the numbers. Thread 1 can be responsible for numbers 0–24, thread 2 can compute 25–50, and so on and so forth. Afterwards, we can have all the threads wait until the other threads are done to get our final result. Let’s go ahead and code that up!

Notice here we created a new class to represent a single instance of a thread and extended the Thread class. We then create 4 instances of the thread and pass in two numbers to indicate the range of numbers we want the thread to compute. We also pass in a reference to the array so the thread can modify the array to store the results of computing the prime number.

After creating the threads, we call .start() on each of them to start work. Afterward, we need to call .join() on each of them because we need to ensure that all the threads finish before we return the array of results. Otherwise, the program will finish before the threads finish.

This model of execution is more formally known as the fork-join model of execution. We’re given a big task that is split up into many smaller tasks which are delegated to each of the individual threads and then joined together to ensure that execution of each thread finishes.

Pictorial view of the fork-join model

Running this version of the code we see that the time taken goes from 5 seconds to around 2 seconds. That’s a 2.5x speed up! In this case, it’s not a perfect 4x speed up since some of the larger numbers take much longer to compute compared to the smaller ones.

Wrapping up

In this edition, we set the stage for what this module will cover. We learned the basics about what a thread is and how we can leverage threads to speed up our programs. In the coming weeks we’ll be exploring the issues that arise from multi-threaded programs and what tools we have available to us to prevent these issues.

Until next time!

--

--