Multiprocessing

Multiprocessing is Python's solution to achieving true parallelism by running multiple processes simultaneously. Unlike threading (which shares memory but is limited by the GIL), multiprocessing creates separate Python processes, each with its own memory space and Python interpreter.

Key Advantages

True parallel execution (no GIL limitations)
Better CPU utilization for computation-heavy tasks
Process isolation prevents memory corruption
Crash resilience (one process crashing doesn't affect others)

Basic Example: Running Simple Processes

from multiprocessing import Process
import os

def task(name):
    """A simple task that prints process info"""
    print(f"Task {name} running in process {os.getpid()}")

if __name__ == '__main__':
    # Create 3 processes
    processes = []
    for i in range(3):
        p = Process(target=task, args=(f"Job-{i}",))
        processes.append(p)
        p.start()  # Launch the process

    # Wait for all to complete
    for p in processes:
        p.join()

    print("All processes completed")

Explanation:

We define a task() function that each process will run
Process() creates a new process (but doesn't start it yet)
start() launches the process
join() makes the main program wait for child processes
Each process gets its own PID (process ID)

Process Pools

Process pools create a fixed number of worker processes that can handle multiple tasks efficiently. They:

Maintain reusable worker processes (avoid creation overhead)
Automatically distribute tasks
Collect results in order
Are ideal for parallelizing CPU-bound operations

Key Methods

map() - Parallel version of built-in map()
apply() - Run one task at a time
map_async() - Non-blocking map()
apply_async() - Non-blocking apply()

Basic Example

from multiprocessing import Pool
import time

def square(x):
    print(f"Processing {x}")
    time.sleep(1)  # Simulate work
    return x * x

if __name__ == '__main__':
    numbers = [1, 2, 3, 4, 5]

    # Create pool with 3 workers
    with Pool(3) as pool:
        results = pool.map(square, numbers)

    print(f"Squares: {results}")
    # Output after ~2 seconds (not 5):
    # [1, 4, 9, 16, 25]

Async Example (Non-Blocking)

from multiprocessing import Pool

def cube(x):
    return x ** 3

if __name__ == '__main__':
    with Pool() as pool:  # Defaults to CPU count
        async_result = pool.map_async(cube, [1, 2, 3])

        # Do other work here while processing...
        print("Main thread working...")

        # Get results when ready
        print(async_result.get())  # [1, 8, 27]

Error Handling

def safe_divide(x):
    try:
        return 100 / x
    except Exception as e:
        return f"Error: {e}"

with Pool(2) as pool:
    print(pool.map(safe_divide, [10, 0, 5]))
    # [10.0, 'Error: division by zero', 20.0]

When to Use Process Pools?

Batch processing similar operations
CPU-bound tasks with independent data
When you need ordered results
For better resource management than manual processes

Since processes don't share memory by default, we need special techniques:

Using Queues for Communication

from multiprocessing import Process, Queue

def producer(queue, items):
    """Adds items to a shared queue"""
    for item in items:
        print(f"Producing {item}")
        queue.put(item)

def consumer(queue):
    """Processes items from queue"""
    while True:
        item = queue.get()
        if item is None:  # Poison pill to stop
            break
        print(f"Consuming {item}")

if __name__ == '__main__':
    q = Queue()
    items = ['A', 'B', 'C', 'D']

    # Start producer and consumer
    p1 = Process(target=producer, args=(q, items))
    p2 = Process(target=consumer, args=(q,))

    p1.start()
    p2.start()

    p1.join()
    q.put(None)  # Signal consumer to stop
    p2.join()

Explanation:

Queue() creates a process-safe FIFO queue
Producer puts items in queue
Consumer takes items out
None acts as a "poison pill" to stop consumer

Shared Memory with Value and Array

from multiprocessing import Process, Value, Array

def increment_counter(counter):
    """Modifies a shared counter"""
    for _ in range(1000):
        counter.value += 1

def modify_array(arr):
    """Changes shared array elements"""
    for i in range(len(arr)):
        arr[i] *= 2

if __name__ == '__main__':
    # Shared integer
    counter = Value('i', 0)  # 'i' = signed integer

    # Shared array
    arr = Array('d', [1.0, 2.0, 3.0])  # 'd' = double

    # Create processes
    p1 = Process(target=increment_counter, args=(counter,))
    p2 = Process(target=modify_array, args=(arr,))

    p1.start()
    p2.start()

    p1.join()
    p2.join()

    print(f"Final counter: {counter.value}")
    print(f"Modified array: {list(arr)}")

Note: For shared variables, you should use locks to prevent race conditions.

Synchronization Between Processes

Using Locks for Safe Access

from multiprocessing import Process, Lock, Value
import time

def safe_increment(lock, counter):
    """Thread-safe counter increment"""
    for _ in range(5):
        time.sleep(0.1)  # Simulate work
        with lock:
            counter.value += 1
            print(f"Process {os.getpid()} incremented to {counter.value}")

if __name__ == '__main__':
    lock = Lock()
    counter = Value('i', 0)

    processes = []
    for _ in range(3):
        p = Process(target=safe_increment, args=(lock, counter))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

    print(f"Final counter value: {counter.value}")

Key Concepts:

Lock() creates a process-safe lock
with lock: creates a critical section
Only one process can execute the locked code at a time

When to Use Multiprocessing?

Good Use Cases

Number crunching (math, statistics)
Machine learning model training
Data processing (large datasets)
Image/video processing

When to Avoid

Simple scripts (overhead isn't worth it)
I/O-bound tasks (use threading or asyncio instead)
When sharing lots of data (IPC has overhead)

Best Practices

Use if __name__ == '__main__': to prevent infinite process spawning
Prefer Pool for batch processing when possible
Minimize shared state - processes work best when independent
Use queues instead of shared memory when possible
Consider ProcessPoolExecutor for future compatibility

Key Advantages​

Basic Example: Running Simple Processes​

Process Pools​

Key Methods​

Basic Example​

Async Example (Non-Blocking)​

Error Handling​

When to Use Process Pools?​

Sharing Data Between Processes​

Using Queues for Communication​

Shared Memory with Value and Array​

Synchronization Between Processes​

Using Locks for Safe Access​

When to Use Multiprocessing?​

Good Use Cases​

When to Avoid​

Best Practices​