Multiprocessing
Multiprocessing is Python's solution to achieving true parallelism by running multiple processes simultaneously. Unlike threading (which shares memory but is limited by the GIL), multiprocessing creates separate Python processes, each with its own memory space and Python interpreter.
Key Advantages
- True parallel execution (no GIL limitations)
- Better CPU utilization for computation-heavy tasks
- Process isolation prevents memory corruption
- Crash resilience (one process crashing doesn't affect others)
Basic Example: Running Simple Processes
from multiprocessing import Process
import os
def task(name):
"""A simple task that prints process info"""
print(f"Task {name} running in process {os.getpid()}")
if __name__ == '__main__':
# Create 3 processes
processes = []
for i in range(3):
p = Process(target=task, args=(f"Job-{i}",))
processes.append(p)
p.start() # Launch the process
# Wait for all to complete
for p in processes:
p.join()
print("All processes completed")
Explanation:
- We define a
task()
function that each process will run Process()
creates a new process (but doesn't start it yet)start()
launches the processjoin()
makes the main program wait for child processes- Each process gets its own PID (process ID)
Process Pools
Process pools create a fixed number of worker processes that can handle multiple tasks efficiently. They:
- Maintain reusable worker processes (avoid creation overhead)
- Automatically distribute tasks
- Collect results in order
- Are ideal for parallelizing CPU-bound operations
Key Methods
map()
- Parallel version of built-inmap()
apply()
- Run one task at a timemap_async()
- Non-blockingmap()
apply_async()
- Non-blockingapply()
Basic Example
from multiprocessing import Pool
import time
def square(x):
print(f"Processing {x}")
time.sleep(1) # Simulate work
return x * x
if __name__ == '__main__':
numbers = [1, 2, 3, 4, 5]
# Create pool with 3 workers
with Pool(3) as pool:
results = pool.map(square, numbers)
print(f"Squares: {results}")
# Output after ~2 seconds (not 5):
# [1, 4, 9, 16, 25]
Async Example (Non-Blocking)
from multiprocessing import Pool
def cube(x):
return x ** 3
if __name__ == '__main__':
with Pool() as pool: # Defaults to CPU count
async_result = pool.map_async(cube, [1, 2, 3])
# Do other work here while processing...
print("Main thread working...")
# Get results when ready
print(async_result.get()) # [1, 8, 27]
Error Handling
def safe_divide(x):
try:
return 100 / x
except Exception as e:
return f"Error: {e}"
with Pool(2) as pool:
print(pool.map(safe_divide, [10, 0, 5]))
# [10.0, 'Error: division by zero', 20.0]
When to Use Process Pools?
- Batch processing similar operations
- CPU-bound tasks with independent data
- When you need ordered results
- For better resource management than manual processes
Sharing Data Between Processes
Since processes don't share memory by default, we need special techniques:
Using Queues for Communication
from multiprocessing import Process, Queue
def producer(queue, items):
"""Adds items to a shared queue"""
for item in items:
print(f"Producing {item}")
queue.put(item)
def consumer(queue):
"""Processes items from queue"""
while True:
item = queue.get()
if item is None: # Poison pill to stop
break
print(f"Consuming {item}")
if __name__ == '__main__':
q = Queue()
items = ['A', 'B', 'C', 'D']
# Start producer and consumer
p1 = Process(target=producer, args=(q, items))
p2 = Process(target=consumer, args=(q,))
p1.start()
p2.start()
p1.join()
q.put(None) # Signal consumer to stop
p2.join()
Explanation:
Queue()
creates a process-safe FIFO queue- Producer puts items in queue
- Consumer takes items out
None
acts as a "poison pill" to stop consumer
Shared Memory with Value and Array
from multiprocessing import Process, Value, Array
def increment_counter(counter):
"""Modifies a shared counter"""
for _ in range(1000):
counter.value += 1
def modify_array(arr):
"""Changes shared array elements"""
for i in range(len(arr)):
arr[i] *= 2
if __name__ == '__main__':
# Shared integer
counter = Value('i', 0) # 'i' = signed integer
# Shared array
arr = Array('d', [1.0, 2.0, 3.0]) # 'd' = double
# Create processes
p1 = Process(target=increment_counter, args=(counter,))
p2 = Process(target=modify_array, args=(arr,))
p1.start()
p2.start()
p1.join()
p2.join()
print(f"Final counter: {counter.value}")
print(f"Modified array: {list(arr)}")
Note: For shared variables, you should use locks to prevent race conditions.
Synchronization Between Processes
Using Locks for Safe Access
from multiprocessing import Process, Lock, Value
import time
def safe_increment(lock, counter):
"""Thread-safe counter increment"""
for _ in range(5):
time.sleep(0.1) # Simulate work
with lock:
counter.value += 1
print(f"Process {os.getpid()} incremented to {counter.value}")
if __name__ == '__main__':
lock = Lock()
counter = Value('i', 0)
processes = []
for _ in range(3):
p = Process(target=safe_increment, args=(lock, counter))
processes.append(p)
p.start()
for p in processes:
p.join()
print(f"Final counter value: {counter.value}")
Key Concepts:
Lock()
creates a process-safe lockwith lock:
creates a critical section- Only one process can execute the locked code at a time
When to Use Multiprocessing?
Good Use Cases
- Number crunching (math, statistics)
- Machine learning model training
- Data processing (large datasets)
- Image/video processing
When to Avoid
- Simple scripts (overhead isn't worth it)
- I/O-bound tasks (use threading or asyncio instead)
- When sharing lots of data (IPC has overhead)
Best Practices
- Use
if __name__ == '__main__':
to prevent infinite process spawning - Prefer Pool for batch processing when possible
- Minimize shared state - processes work best when independent
- Use queues instead of shared memory when possible
- Consider ProcessPoolExecutor for future compatibility