Python multiprocessing worker model
- 1 Overview
- 2 Multiprocessing
- 3 Model
- 3.1 Queue
- 3.2 Worker
- 3.2.1 Basic worker
- 3.2.2 More complex workers
- 3.3 System
- 3.4 Exit
Overview
Audience: Autonomy subteam.
Autonomy abstracts concurrency and parallelism with Python multiprocessing.
Concurrency and parallelism are different:
Concurrency is about maintaining consistency (i.e. the same outcome) regardless of instruction order. It is possible for a program to be concurrent while not being parallel
Parallelism is about executing multiple instructions simultaneously. It is possible for a program to be parallel while not being concurrent
In general, programs must be concurrent to be useful, but there are some rare exceptions. For example:
Multiplication with floating point values might have a slightly different results depending on the order, because floating point values cannot be perfectly represented, but this is acceptable as long as the difference is within some acceptance bound
Multiprocessing
Python has an interpreter which compiles and runs Python code. While the Python multithreading library allows the use of concurrency, Python has a global interpreter lock (GIL), which is a mutual exclusion that forces all interpreters within a process to run one at a time. The reason the GIL exists is to make it easier to do reference counting for garbage collection. There have been discussions and attempts to remove the GIL, but none have succeeded (so far).
Running multiple processes allows parallelism, since the GIL is per process. The Python multiprocessing library is similar to the multithreading library, making it simple to switch.
Resources:
Global interpreter lock: GlobalInterpreterLock - Python Wiki
Multithreading: threading — Thread-based parallelism
Multiprocessing: multiprocessing — Process-based parallelism
Model
Autonomy abstracts concurrency by implementing the Kahn process model: Kahn process networks
This reduces concurrency risks such as race conditions, deadlocks, and livelocks.
Queue
Interprocess communication occurs through FIFO queues. The queue data structure from the multiprocessing library is thread safe (i.e. safe for concurrency), and contains the following methods:
Constructor: The argument is the maximum number of items that can be in the queue
put()
: Places an item at the end of the queue, blocks if the queue is fullget()
: Removes an item at the start of the queue, blocks if the queue is empty
An optional timeout argument can be provided, which causes the method to throw an exception if the timeout occurs. The corresponding non blocking versions are the same as the blocking versions with a timeout 0.
It is unsafe to check the count of items in the queue, and it is not possible to read any item in the queue without removing it.
Worker
Basic worker
A basic worker is a process with a single input queue, a single output queue, and class object(s) to hold state. The worker repeatedly receives an item from an input queue, does work, and sends the result to the next worker. The input and output rate can be different and variable (e.g. for every 5-10 input items, outputs 1 item).
If the input queue is empty, the worker blocks until there is an item in the queue for the worker to get.
The worker is controlled by the worker controller process, which can pause, resume, and request exit. The worker may also communicate to the worker controller as it runs (e.g. heartbeat).
Key:
Purple: Process entrance and exit
Orange: Communication with the worker controller
Blue: Not the responsibility of the worker
More complex workers
Instead of an input and/or output queue, some workers may interact with the operating system’s IO (or an abstraction) instead. For example:
Camera driver
Files
Library interface
Some workers may have multiple input and/or output queues. These workers are more complex as they need to service multiple potentially blocking queues.
System
Workers are connected together into a system, analogous to a car factory. The system is modular as the workers can be inserted, removed, or swapped, as long as the inputs and outputs remain the same. Additionally, the system can be scaled up or down by inserting or removing multiple of the same type of worker.
For best performance, it is ideal that workers do not block at all, so the system must be tuned accordingly. In practice, it is better for the downstream workers to block on empty queues (waiting for input) rather than upstream workers to block on full queues (waiting for output).
Key:
Grey: Same type of worker
Blue: Not the responsibility of the system
Labels:
Worker A1 receives input from a hardware driver
Worker A2 is a basic worker with a single input queue and a single output queue
Worker 3 is a worker with 2 input queues and 1 output queue
The complexity depends on whether synchronization is required from items of both queues (e.g. merging by timestamp)
Worker 4 is a worker with 1 input queue and 2 output queues
Multiple output queues is easier than multiple input queues
Worker 5-1 and worker 5-2 are the same type of worker, duplicated to improve throughput
Worker 6 is a worker with 2 input queues and 3 output queues
Multiple of the same type of input queue is easier than multiple input queues of different types
Worker 7 sends output into a hardware driver
Queues can connect downstream workers to upstream workers in a possible feedback loop
Worker B1 both receives input and sends output with a hardware driver
If communication to and from the hardware driver is required, but there is only a single instance, then it must be interfaced with a single worker rather than a pair of workers
Exit
TODO