Understanding Multithreading in Python

Steven RogerFebruary 12, 2021

0 14,819 10 minutes read

Multithreading is a technique in programming where more than one task can be run in a program concurrently. Before going further Multithreading in Python, let’s see a use case where multithreading is crucial. Say, for instance, you run a Python code that requests some input from a server before going ahead with the rest of the code.

There may be delays from the server end which would mean the program would do nothing until the server responds. If you need to send a couple of more requests from the server, the script will appreciably take more time to load completely. This is because the other requests are not sent to the server until the server returns output for the earlier request. Running programs in an order where one task has to finish before the next task runs are called a synchronous program.

One easy way of solving this time delay problem in a synchronous task is by threading. In threading, the program sends all the requests at once rather than waiting for the first request to be sent before the next. This technique will save a lot of time, which is critical in programming.

In this tutorial, you will learn how to create threads in Python with coding examples. By the end of this tutorial, you will learn

What is a Multithread and Process?
CPU bound Tasks and I/O Bound Tasks
Why should you care about Multithreading?
Threading Library in Python
Creating Threads in Python using the Threading Library
Useful Functions in the Threading Library
Complications with Threading
Solving the Deadlock and Race Condition Problem

Let’s begin.

Table of Contents

What is a Multithread and Process?

Let’s discuss some key terms before going forward.

A multithread sometimes called a thread is the ability of a program to run more than one task concurrently. This way, programs that require a longer time to run would not block the rest of the program from running.
A process is simply the program in execution. A process typically requires the CPU of the computer to be in operation. For instance, a process can be when you do a numerical operation in Python. Since in such cases, your CPU will be required to run that task.

CPU bound Tasks and I/O Bound Tasks

When discussing multithreading, it is vital to have a clear understanding of what CPU-bound tasks and I/O bound tasks are.

A CPU task is a task that requires your CPU to actively in operation to run a task. In such tasks, the rate of progress is hinged on the speed of your CPU. Training a neural network is a CPU bound task because it requires the CPU to function and is also dependent on the speed of the CPU. That is why a neural network may take different times to train the same data on different computers.
I/O bound task is a task that waits for the output, given its input. In these kinds of tasks, the CPU is not doing much work. The program just waits for output after sending an input. The speed of the I/O tasks depends on the speed of the I/O ecosystem and not necessarily the speed of the CPU. An example is when you request information from a server.

Tying it together, it is critical to point out that multithreading is only useful for multiple I/O bound tasks. For tasks that involve multiple CPU bound operations, you will need to engage a technique called multiprocessing. This tutorial is, however, focuses on multithreading operations.

Why should you care about Multithreading?

Multithreading allows you to split tasks in a python code and run the chunk of codes simultaneously. With multithreading, tasks that are I/O bound can be done faster. Making the speed, rendering, and performance of your program better.

Let’s see a coding example. In the code below, we would create a function that prints a statement, sleep and print another statement. The sleep function was used to simulate that the function would wait for a defined second without doing anything. More like an I/O bound task.

Let’s see how long it would take the program to run the task once.

#import the necessary libraries
import time
 
#check when the program starts
start = time.time()
 
def task():
    '''This function prints a statement, sleep 
    and prints another statement after the sleep'''
 
    print('Begin sleeping from now')
    time.sleep(1)
    print('Now out of sleep')
 
#call the function once
task()
 
#check time the program finishes
finish = time.time()
 
#print the time the program takes to run
print(f"All tasks completed in {round(finish - start, 3)} seconds")

Output:
Begin sleeping from now
Now out of sleep
All tasks completed in 1.008 seconds

As seen, it took the program 1.008 seconds to perform the task once. Now let us see how long it would take to perform the tasks 3 times.

#import the necessary libraries
import time
 
#check when the program starts
start = time.time()
 
def task():
    '''This function prints a statement, sleep 
    and prints another statement after the sleep'''
 
    print('Begin sleeping from now')
    time.sleep(1)
    print('Now out of sleep')
 
#call the function thrice
task()
task()
task()
 
#check time the program finishes
finish = time.time()
 
#print the time the program takes to run
print(f"All tasks completed in {round(finish - start, 3)} seconds")

Output:
Begin sleeping from now
Now out of sleep
Begin sleeping from now
Now out of sleep
Begin sleeping from now
Now out of sleep
All tasks completed in 3.008 seconds

From the result, it takes 3.008 seconds to do the task 3 times. This means that during the time the program was sleeping, the program was just lying fallow doing nothing, while the next tasks were pending. It explains why it took the program 3 times the time it took the initial one.

Next, we will see how threading will cut down this time using the threading library in Python.

Threading Library in Python

There are two modules used for multithreading in Python: the thread library and the threading library. The thread library which was commonly used in older versions of Python has gradually struggled to get traction in Python 3 and above. In Python 3, it is considered as being backward compatible and that is why it must be imported as _thread rather than just thread.

In this tutorial, we will discuss how to use the threading library.

Creating Threads in Python using the Threading Library

Now let’s use the threading library to create threads from the previous task.

First, we begin by importing the threading and time library.

#import the necessary libraries
import time
import threading

Afterward, we define a function that does a particular task. We will repeat the same task as in the first example.

#check when the program starts
start = time.time()
 
def task():
    '''This function prints a statement, sleep 
    and prints another statement after the sleep'''
 
    print('Begin sleeping from now')
    time.sleep(1)
    print('Now out of sleep')

After the task has been defined in a function, we then need to create the thread. The thread is created using the Thread() class of the threading module. It takes an important argument called target. If the function has parameters, the argument to the function will be defined using the args keyword. But since our task had no parameter, only the target parameter ( which is the name of the function) will be defined.

#create threads
thread1 = threading.Thread(target=task)
thread2 = threading.Thread(target=task)

After creating the thread objects, we need to use the start() function to ensure the thread takes effect in the program. Now, let’s run the previous program again with threads. Note that with the thread.start() method. You do not need to call the function explicitly again.

#import the necessary libraries
import time
import threading
 
#check when the program starts
start = time.time()
 
def task():
    '''This function prints a statement, sleep 
    and prints another statement after the sleep'''
 
    print('Begin sleeping from now')
    time.sleep(1)
    print('Now out of sleep')
 
#create thread objects
thread1 = threading.Thread(target=task)
thread2 = threading.Thread(target=task)
 
#start the thread
thread1.start()
thread2.start()
 
#check time the program finishes
finish = time.time()
 
#print the time the program takes to run
print(f"All tasks completed in {round(finish - start, 3)} seconds")

Output:
Begin sleeping from now
Begin sleeping from now
All tasks completed in 0.0 seconds
Now out of sleep
Now out of sleep

Notice how the result is. It is not what we wanted. Let’s explain what is going on.

After printing ‘Begin sleeping from now’, the program was to sleep for 1 second. But because of the thread, the program goes on with other lines of code. That’s why it calculates the time it took to run the code to be 0 seconds. Now after 1 second of runtime, it goes out of sleep and prints ‘Now out of sleep’ twice. And it does not calculate the runtime anymore.

This ‘abnormality’ happened because the interpreter sees both threads as separate threads.

We can make the program not move on until both threads are complete by joining the threads. To join a thread, we use the join function.

In our example, we would write

#join the thread
thread1.join()
thread2.join()

Now running the entire code, let’s see the run time.

#import the necessary libraries
import time
import threading
 
#check when the program starts
start = time.time()
 
def task():
    '''This function prints a statement, sleep 
    and prints another statement after the sleep'''
 
    print('Begin sleeping from now')
    time.sleep(1)
    print('Now out of sleep')
 
#create thread objects
thread1 = threading.Thread(target=task)
thread2 = threading.Thread(target=task)
 
#start the thread
thread1.start()
thread2.start()
 
#join the thread
thread1.join()
thread2.join()
 
 
# #call the function thrice
# task()
# task()
# task()
 
#check time the program finishes
finish = time.time()
 
#print the time the program takes to run
print(f"All tasks completed in {round(finish - start, 3)} seconds")

Output:
Begin sleeping from now
Begin sleeping from now
Now out of sleep
Now out of sleep
All tasks completed in 1.016 seconds

Now this is what we were expecting. The line of code was not printed until both threads have run completely. Notice that the time is now 1.016 seconds. As explained earlier, with threads, while the first function was sleeping, the second function was running as well. That is why both functions printed ‘Begin sleeping from now’, then both slept at almost the same time. The second did not have to wait for the first to finish sleeping.

Going forward, let’s talk about some of the useful functions in the threading library.

Useful Functions in the Threading Library

start(): The start() method is used to begin a thread action. It is important to note that the start method should be called once for a particular thread object. If it is called more than once, the second call returns a runtime error.
join(): This method is used to attach more than one thread such that the next line of codes would not run until all the thread gets executed.
run(): The method is used to denote a thread activity. It can get overridden if a class is created that extends the initial thread class.
activeCount(): This function returns the number of thread objects that are active at a given point.
currentThread(): This returns the present object of a thread class
enumerate(): This is used to return a list of active threads.
isAlive(): This returns a boolean (True or False). It returns True if the thread is alive and false if otherwise.

Before concluding this tutorial, let’s discuss some of the complications that can occur during threading.

Complications with Threading

Some unwanted situations could occur when using threads. Let’s discuss the two common ones.

Deadlocks:

This is one situation developers always try to avoid. Deadlocks are best explained with a classical analogy called the Dining Philosophers problem.

The Dining Philosophers problem states that if 5 philosophers are on a dining table about to eat 5 plates of pasta with 5 forks on the table. Two states are possible. They could either be thinking or eating. For a philosopher to eat, however, they must grab two forks and not one. Understanding Multithreading in Python

Source: WIkipedia

The deadlock problem occurs when all five philosophers grab a fork at the same time. This means that none of the philosophers would be able to grab the second to start eating, making none of them eat.

Bringing this back to threading, deadlock happens when different threads (the philosophers) attempt to share the same resources (the forks) at a given time. This leads to none of the threads running successfully in the end.

Race Conditions:

A race condition is a state in your program where the system tries to perform more than two operations at the same time. This can cause conflict as to which variable should be used or modified when running the program.

Solving the Deadlock and Race Condition Problem

To solve the deadlock and race condition problem, a lock object can be created from the threading module. What happens is that if a thread requires a resource, it gets a lock for the resource. Once the thread gets the lock, no other thread would be able to access the resource until the lock is let go by the initial thread. This ensures that a thread has all the resources it needs to run successfully without having conflicts from a different thread. This is how the deadlock and race condition problem is solved.

To implement locking in Python, first, instantiate the locking object from the Lock() class.

lock = threading.Lock()

Now to lock the thread, you use the acquire() method. This makes sure that no other thread access the resources to run the thread successfully

lock.acquire()

To release a thread from using a particular resource, you used the release() method.

lock.release()

Let’s bring all this together. If we create two functions that would share the same variable ‘i’ at a given time, we must lock the variable ‘i’ to a thread and release it to the second thread after the first thread has utilized the variable ‘i’. See the code below.

# import time
import threading
#instantiate the lock object
lock = threading.Lock()
 
def task1():    
    for i in [1, 2]:
  #lock the variable ‘i’ to this task
        lock.acquire()
        print('Thread 1 locked')  
  #release the variable ‘i’     
        lock.release()
        print('Thread 1 released')
 
def task2():
    for i in [1, 2]:
        lock.acquire()
        print('Thread 2 locked')        
        lock.release()
        print('Thread 2 released')
 
 
#create thread objects
thread1 = threading.Thread(target=task1)
thread2 = threading.Thread(target=task2)
 
#start the thread
thread1.start()
thread2.start()
 
#join the thread
thread1.join()
thread2.join()

Output:
Thread 1 locked
Thread 1 released
Thread 1 locked
Thread 1 released
Thread 2 locked
Thread 2 released
Thread 2 locked
Thread 2 released

As observed from the output, the program was focused on thread 1 and made sure to complete the for loop before moving on to the next thread. In other words, the variable ‘i’ was locked on to thread 1 until it was complete. Then ‘i’ was assigned to thread 2.

In summary,

You have learned how to create threads and write faster codes for I/O bound tasks. You learned how to use the start() method to start and thread and join method to make sure threads run completely before the next lines of code are run.

Furthermore, you learned about deadlocks and race conditions which are unwanted situations during threading. You finally learned how to solve these problems using the concept of locking in the threading module. If you’ve got any questions, feel free to leave them in the comment section and I’d do my best to answer them.

Facebook Comments