We have simply started a thread, created a thread object, and then invoked the .start() function on it.
The program stores a list of thread objects in order to wait for them later using the .join() function.
From the output, you’ll realize that the threads are started in the order that you expect.
However, they may finish in the opposite order.
Note that the orderings will be different for different runs of the code.
The operating system is responsible for determining the order by which the threads are run.
Due to this, it may be difficult for you to predict the order.
The good news is that Python provides a number of ways for coordinating threads to have them run together.
We shall look at this later.
For now, let’s look into how we can make management of threads a bit easier.
This provides us with an easier way of starting a group of threads than the mechanism we have discussed above.
From Python 3.2, the ThreadPoolExecutor comes as part of the standard library in concurrent.futures.
You can easily create it as a context manager using the _with _statement.
You can then use it to create and destroy the pool.
Let us rewrite the main from our previous example using the ThreadPoolExecutor:
if name == “main”:
format = "%(asctime)s: %(message)s"
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as thread_executor:
The above code creates a ThreadPoolExecutor as a context manager, stating the number of worker threads it needs in the pool.
The code has then used .map() to iterate over the values generated by _range(3), _passing each to a thread in the pool.
The end of the with _block makes the ThreadPoolExecutor perform a _.join() on every thread in the pool.
Always use the ThreadPoolExecutor as a context manager when possible to avoid forgetting to .join() the threads.
When you run the above code, you should get the following:
Also, the threads may not finish in the same order that they were started.
The reason is that scheduling of threads is handled by the operating system, making it difficult to predict how they will finish.
When handling threads, you will come across race conditions.
Race conditions occur when one or more threads access shared data or a resource.
Race conditions are a rare condition, and they produce different results.
This makes it difficult to debug them.
However, we will be dealing with a race condition that occurs every time.
We will write a class to update a database.
However, don’t worry about having a database, we will simply fake it!
We will give our class the name UpdateDatabase _and add the _.init() and _.update() _methods to it:
The UpdateDatabase class keeps track of a single number, that is, .value.
This data is shared, hence, it will cause a race condition.
The init() has helped us initialize the value of .value to 0.
We’ve also created the update() function.
It simulates how to read a value from a database, perform some computation on it, and then write the new value back to the database.
Since it’s a simulation, for reading from a database, we are copying .value to a local variable.
For the computation, we are adding 1 to the value then .sleep() for some time.
For writing to the database, we are copying our local variable back to .value.
You can use the fake database as follows:
if name == “main”:
format = "%(asctime)s: %(message)s"
database = UpdateDatabase()
logging.info("Testing update. Starting value is %d.", database.value)
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as thread_executor:
for index in range(2):
logging.info("Testing update. Ending value is %d.", database.value)
The code will create a ThreadPoolExecutor with two threads and then call .submit() on each thread, instructing them to execute database.updateDB().
The code will give the following output upon execution:
We have two threads, each thread running the updateDB() and adding a 1 to .value.
So, you might have been expecting the value of _database.value _to be 2.
However, it’s not.
Because we have used threading.
The two threads have an interleaving access to the shared object, that is, .value.
Due to this, they are overwriting each other’s results.
Now that you’ve know what race conditions are, let’s discuss some of the ways that can help you avoid or handle them.
Basic Synchronization Using Lock</strong>
The lock is one of the mechanisms that you can use to avoid or solve race conditions.
It provides you with a way of allowing only one thread at a time into the read, modify, then write part of your code.
So, the lock works through the mechanism of mutual exclusion.
The lock operates in the same way as a hall pass.
Only one thread is allowed to have the lock at a time.
Any other thread that wants to use the thread is forced to wait for the current owner to give it up.
This is achieved using the .acquire() and .release() functions.
To acquire the lock, a thread should call my_lock.acquire().
If the lock is held by another thread, it has to wait for it to be released.
If a thread gets the lock and refuses to give it back, the program gets stuck.
However, the lock also operates like a context manager, hence, you can use it in the _with _statement.
This means that the lock will be released automatically when the _with _exits.
Now, let’s add a lock to the UpdateDatabase class:
self.value = 0
self._lock = threading.Lock()
def locked_updateDB(self, name):
logging.info("Thread %s: starting update", name)
logging.debug("Thread %s about to lock", name)
logging.debug("Thread %s has lock", name)
local_copy = self.value
local_copy += 1
self.value = local_copy
logging.debug("Thread %s about to release lock", name)
logging.debug("Thread %s after release", name)
logging.info("Thread %s: finishing update", name)
The biggest change in the above code is the addition of the _lock() member, which is an object of threading.Lock().
When executed, the code should return the following output:
Your program has just worked!
And look, the output is now 2, not 1.
The reason is that we’ve used a lock.
If you need to see the full logging, simply add the following statement after configuring the logging output in main:
The code will now return the following:
From the above output. Thread 0 acquired the lock first, then Thread 1 acquired the lock last.
Thread 1 had to wait for Thread 0 to release the lock in order to acquire it.
This is a common problem that is associated with locks.
As you know, when a thread calls .acquire(), it has to wait for the thread that is holding the lock to call .release().
Consider the code given below:
t = threading.Lock()
print(“before first acquire”)
print(“after first acquire”)
print(“lock acquired twice”)
So, what will happen when you run the above code?
The program will hang!
Ask me, why?
Because when it runs the second _t.acquire(), _it will have to wait for the lock to be released.
This will cause a deadlock.
The basic way to avoid this is by removing the second call to the lock.
However, there are two subtle things that cause deadlocks to happen:
1. A bug in the code where a lock is not released properly.
2. A design issue in which a utility function will be called by functions that may or may not have the lock.
You can reduce the first situation by using a Lock as a context manager.
To solve the second issue, you can use RLock.
This is a Python object that allows a thread to acquire an RLock multiple times before calling .release().
The thread is still required to call .release() the same number of times that it called .acquire().
Here is what you’ve learned…
In Python, threading allows you to have parts of your program execute concurrently.
Threading makes the design of Python programs simple.
The Python standard library provides threading.
To start a separate thread, simply create an instance of Thread then invoke the .start() function.
If a thread is not a daemon, the program will wait for the thread to finish before it can terminate.
A daemon thread will shut down automatically once the program terminates.
Race conditions occur when two or more threads access a shared resource or piece of data.
You can use locks to avoid or solve race conditions.