๐Ÿงต Threading and Multiprocessing

Threading and multiprocessing are two key approaches for achieving parallelism and concurrency in Python, enabling your program to do multiple operations at once and thereby improving performance for I/O-bound and CPU-bound tasks. Here's an overview of how and when to use them.

Threading

Threading is the practice of running multiple threads (lighter-weight processes) concurrently in a single process. It's useful for I/O-bound tasks, where the program spends a lot of time waiting for external events (e.g. file I/O or network operations).

Python provides the threading module to implement threading in your programs. Here's a simple example of using threading to perform a task in the background:

import threading
import time

def print_numbers():
    for i in range(5):
        time.sleep(1)
        print(i)

# Create a thread
thread = threading.Thread(target=print_numbers)

# Start the thread
thread.start()

# Wait for the thread to complete
thread.join()

print("Done")

Multiprocessing

Multiprocessing involves running multiple processes concurrently, each with its own Python interpreter and memory space. It's ideal for CPU-bound tasks, where the program needs to perform extensive computations, or for the parallel processing of data, to perform concurrent operations on large datasets.

The multiprocessing module in Python allows you to create and manage separate processes. Here's an example of using multiprocessing to perform a computationally heavy task:

from multiprocessing import Process, current_process
import time

def compute_heavy_task(name):
    print(f"Process {name}: starting")
    # Simulate a heavy computation
    time.sleep(2)
    print(f"Process {name}: finishing")

if __name__ == '__main__':
    # Create processes
    processes = [Process(target=compute_heavy_task, args=(f'Process {i}',)) for i in range(5)]

    # Start all processes
    for process in processes:
        process.start()

    # Wait for all processes to complete
    for process in processes:
        process.join()

    print("Done")

Key Considerations

Python's GIL means that even if you're using multiple threads, only one thread can execute Python bytecode at a time. This limitation doesn't apply to I/O operations but does limit the effectiveness of threads for CPU-bound tasks.

Meanwhile, multiprocessing can lead to significant memory overhead, as each process has its own Python interpreter and memory space. Creating and managing processes is heavier than threads. Therefore, multiprocessing is often not beneficial for tasks that are quick to execute.

Both approaches are powerful tools in Python for achieving parallelism and improving program performance, but choosing the right one depends on the nature of your tasks.

Get PDF