Python multiprocessing worker model

Overview

Audience: Autonomy subteam.

Autonomy abstracts concurrency and parallelism with Python multiprocessing.

Concurrency and parallelism are different:

  • Concurrency is about maintaining consistency (i.e. the same outcome) regardless of instruction order. It is possible for a program to be concurrent while not being parallel

  • Parallelism is about executing multiple instructions simultaneously. It is possible for a program to be parallel while not being concurrent

    • In general, programs must be concurrent to be useful, but there are some rare exceptions. For example:

      • Multiplication with floating point values might have a slightly different results depending on the order, because floating point values cannot be perfectly represented, but this is acceptable as long as the difference is within some acceptance bound

Multiprocessing

Python has an interpreter which compiles and runs Python code. While the Python multithreading library allows the use of concurrency, Python has a global interpreter lock (GIL), which is a mutual exclusion that forces all interpreters within a process to run one at a time. The reason the GIL exists is to make it easier to do reference counting for garbage collection. There have been discussions and attempts to remove the GIL, but none have succeeded (so far).

Running multiple processes allows parallelism, since the GIL is per process. The Python multiprocessing library is similar to the multithreading library, making it simple to switch.

Resources:

Model

Autonomy abstracts concurrency by implementing the Kahn process model: Kahn process networks

This reduces concurrency risks such as race conditions, deadlocks, and livelocks.

Queue

Interprocess communication occurs through FIFO queues. The queue data structure from the multiprocessing library is thread safe (i.e. safe for concurrency), and contains the following methods:

  • Constructor: The argument is the maximum number of items that can be in the queue

  • put() : Places an item at the end of the queue, blocks if the queue is full

  • get() : Removes an item at the start of the queue, blocks if the queue is empty

An optional timeout argument can be provided, which causes the method to throw an exception if the timeout occurs. The corresponding non blocking versions are the same as the blocking versions with a timeout 0.

It is unsafe to check the count of items in the queue, and it is not possible to read any item in the queue without removing it.

Worker

Basic worker

A basic worker is a process with a single input queue, a single output queue, and class object(s) to hold state. The worker repeatedly receives an item from an input queue, does work, and sends the result to the next worker. The input and output rate can be different and variable (e.g. for every 5-10 input items, outputs 1 item).

If the input queue is empty, the worker blocks until there is an item in the queue for the worker to get.

The worker is controlled by the worker controller process, which can pause, resume, and request exit. The worker may also communicate to the worker controller as it runs (e.g. heartbeat).

Key:

  • Purple: Process entrance and exit

  • Orange: Communication with the worker controller

  • Blue: Not the responsibility of the worker

More complex workers

Instead of an input and/or output queue, some workers may interact with the operating system’s IO (or an abstraction) instead. For example:

  • Camera driver

  • Files

  • Library interface

Some workers may have multiple input and/or output queues. These workers are more complex as they need to service multiple potentially blocking queues.

System

Workers are connected together into a system, analogous to a car factory. The system is modular as the workers can be inserted, removed, or swapped, as long as the inputs and outputs remain the same. Additionally, the system can be scaled up or down by inserting or removing multiple of the same type of worker.

For best performance, it is ideal that workers do not block at all, so the system must be tuned accordingly. In practice, it is better for the downstream workers to block on empty queues (waiting for input) rather than upstream workers to block on full queues (waiting for output).

Key:

  • Grey: Same type of worker

  • Blue: Not the responsibility of the system

Labels:

  1. Worker A1 receives input from a hardware driver

  2. Worker A2 is a basic worker with a single input queue and a single output queue

  3. Worker 3 is a worker with 2 input queues and 1 output queue

    1. The complexity depends on whether synchronization is required from items of both queues (e.g. merging by timestamp)

  4. Worker 4 is a worker with 1 input queue and 2 output queues

    1. Multiple output queues is easier than multiple input queues

  5. Worker 5-1 and worker 5-2 are the same type of worker, duplicated to improve throughput

  6. Worker 6 is a worker with 2 input queues and 3 output queues

    1. Multiple of the same type of input queue is easier than multiple input queues of different types

  7. Worker 7 sends output into a hardware driver

  8. Queues can connect downstream workers to upstream workers in a possible feedback loop

  9. Worker B1 both receives input and sends output with a hardware driver

    1. If communication to and from the hardware driver is required, but there is only a single instance, then it must be interfaced with a single worker rather than a pair of workers

Exit

TODO