Real Time Systems (RTOS and Friends) - WORK IN PROGRESS

NOTE: This guide is written for those with a basic understanding of firmware engineering and the C language. It is recommended that the articles on interrupts are read prior.

Motivation

As embedded software engineers, we often work in environments where our systems must meet strict, non-negotiable timing requirements. It’s crucial that we can guarantee certain tasks will execute within specified time intervals, as failure to do so could have serious or even catastrophic consequences for the system's operation. Systems with these requirements are said to have 'real-time' constraints, necessitating careful attention to software architecture and design to meet these demands reliably.

So, what is a Real-Time System?

“Real-time” means that a system must meet a timing constraint. Constraints are often split up into two categories described below. In general, if you say that a system should do something within x time, there is likely a real-time constraint of some form.

  • Hard Real-Time - missing a timing deadline is a total failure. Think of a pacemaker or airbag deployment system - if a pacemaker decides to pulse the heart later than usual, it may kill the patient. An airbag deploying late is likely to cause severe injury to an occupant (likely from failing to stop the occupant’s impact and firing while an occupant may be against the airbag).

  • Soft Real-Time - missing a timing deadline is not a total failure but may degrade the system’s usefulness. Think video games - if frames per second dips, it does suck, but not enough to warrant shutting down the game entirely.

A real-time operating system (RTOS) is a software layer that enables embedded systems to meet a hard timing constraint. This guide will discuss how an RTOS allows an engineer to achieve timing guarantees through its various mechanisms.

Defining the problem

Establishing the concept of a real-time system and its requirements allows us to lay the technical foundation for its operation.

Action

An action is something the system does. An example of an action is blinking an LED after a user presses a button. In a real-time context, the action often has a timing constraint—for example, blinking an LED less than 50ms after a user presses a button or blinking an LED at a rate of no less than 10Hz.

*Note: I am explicitly avoiding the verbiage of “task” here, although that is another way of defining this term. I am avoiding the term “task” since in an RTOS, a “task” refers to something else.

Deadline

A deadline is exactly what it sounds like—a time by which an action must be completed. In the above example, the action of blinking an LED less than 50 ms after a user presses a button has a 50ms deadline after the button is pressed.

Period

The period of an action specifies how often an action should run. In the example of blinking an LED at a rate of no less than 10Hz, the period is 10Hz or 0.1s.

In basic real-time scheduling theory, the deadline for completing an operation is often assumed to be the same as its period. However, this assumption does not always hold. For example, a controller might need to poll a device at a rate of 10 Hz (every 0.1 seconds) but must complete each polling operation within 0.05 seconds. This guide stays within the realm of basic real-time scheduling theory, so don’t worry about this situation.

Worst-Case Execution Time (WCET)

The worst-case execution time (WCET) of an operation is the maximum time it can take to complete. For example, when blinking an LED, the processor requires some time to set the GPIO pin to the appropriate state, even if this time is relatively short. Similarly, an I2C transaction may require a certain amount of time before new data becomes available. There are various methods to measure WCET, though experimentation is the most common approach for complex operations.

*I am being specific in saying it’s the maximum time of “operation” rather than “action” (or for readers who may know what’s coming, “task”). WCET is a generic term that can apply to any operation a software module takes.

Solutions to achieve real-time system deadlines

There are a few solutions to achieving real-time system deadlines. We’ll discuss 2 common ones - superloop (also referred to as static scheduling), and a real-time operating system (specifically, a fixed-point-preemptive scheduler).

The key theme of the solutions discussed for real-time system deadlines revolves around ensuring that actions are scheduled to meet their timing deadlines.

For discussion, we’ll simplify the problem to the following:

  • Our system needs to run a set of n actions periodically, with a deadline equalling the period of the task

Superloop

A superloop is the simplest solution to achieving a real-time system. A superloop refers to running all actions in a system at a fixed frequency in one big loop. Here’s an example of how the code might look like -

int main() { // Superloop while (1) { action_1(); action_2(); action_3(); //.... action_n(); delay_ms(100); // 10Hz } return 0; }

Although it looks trivial, there are a lot of advantages with the super loop approach -

  • Proving it meets the timing deadline is easy

    • If the WCET < (the lowest period deadline of all actions), the deadline will be met.

  • There is no need to integrate an RTOS, and deal with its complexity

    • Although an RTOS has not been formally introduced at this point, this guide will demonstrate that an RTOS means you’re open to introducing bugs due to its pseudo-concurrent nature. No RTOS means it’s impossible to introduce those bugs in the first place.

However, this approach also comes with some cons that can make it unsuitable for specific systems

  • Every action must be run at the rate of the lowest period deadline, which is wasteful of CPU resources.

  • Per Phillip Koopman’s Better Embedded Software, a big part of the simplicity of a super loop is that it’s easy to prove a system will meet a timing deadline. However, it may be tempting to produce code that runs actions conditionally (ex, only after some time has elapsed or if some specific circumstance occurs), and this means you may not be able to prove that the system meets its deadlines. In this case, Koopman recommends the use of other approaches.

In my experience, a super-loop design encourages simplicity and is often surprisingly suitable in timing-critical applications at the expense of increasing CPU usage. I’ve found that unless I end up with an action with a significantly long WCET, a super loop is often enough. For further reading, see Nathan Jones' article on why you don't need an RTOS.

Real-Time Operating System (RTOS)

Another common approach to meeting real-time deadlines is to use a middleware solution called a Real-Time Operating System (RTOS). In this context, 'real-time' refers to the system’s ability to process and execute tasks within specific timing constraints, ensuring tasks are completed within a defined time interval.

In the context of an RTOS, a task is a fundamental unit of execution that represents a sequence or collection of operations designed to perform a specific function within the system. Each task typically runs independently, with its own code and data, and may be scheduled to execute at specific intervals or in response to particular events. Tasks can range in complexity from simple operations, such as toggling an LED, to more complex routines, like managing sensor data processing or handling communication protocols. The RTOS manages tasks according to priority levels and timing requirements, ensuring that critical tasks meet their deadlines and that system resources are allocated efficiently.

Here’s a short free RTOS snippet that shows what a task might look like. Note that a task is often a collection of actions that runs together with the same deadline (we will elaborate on this later)

// Task to toggle an LED void vTask(void *pvParameters) { const TickType_t xDelay = pdMS_TO_TICKS(500); // 500 ms delay while (true) { LED_toggle(); if (button_pressed) { turn_heater_on(); } spin_a_motor(); // Delay for 500 ms vTaskDelay(xDelay); } }

What an RTOS is NOT

The words “Real Time Operating System” can be a little misleading. Here’s a rapid-fire list of what an RTOS is not:

  • Using an RTOS does NOT mean stuff can run fast(er).

    • In fact, adding one can slow down a system!

  • An RTOS does not guarantee real-time performance.

  • An RTOS is not a requirement to achieve real-time performance and meet timing deadlines.

  • It is not an “operating system” in the sense that it does not provide file/device I/O (input/output) and other such functions, as it does in a general-purpose operating system (GPOS - ex: Windows, Linux, OSX)

Fundamental Principle of an RTOS

An RTOS allows engineers to define tasks that can be scheduled to run in a way that mimics concurrent threading, often referred to as 'proto-threading.' In embedded systems, which typically operate on a single core, tasks appear to run concurrently but actually take turns executing on the same core. The RTOS scheduler manages this time-sharing, giving each task a slice of processor time to meet real-time requirements without true parallelism. Imagine a juggler handling multiple balls: although they’re not all in their hands at once, they manage each one in quick succession so that all appear to be in continuous motion. Similarly, the RTOS juggles tasks by quickly switching between them, giving the illusion of concurrent execution. The software module that does the ‘juggling between tasks’ is called the scheduler, and the process by which an RTOS switches a task is referred to as a context switch.

Here’s a graphical version of what this looks like, where the bars on each task represent the task executing.

image-20241104-225906.png
2 Task Timing Diagram

Various scheduling algorithms and techniques determine when a context switch occurs and how processor time is allocated or 'sliced' among tasks. For now, it’s enough to understand that an RTOS enables multiple tasks to run on a single core by switching between them, allowing each task a share of processor time.

The specifics of how a context switch works are considered outside the scope of this document, but you are welcome to Google for them!

Types of RTOS’s

There are 2 types of RTOSs:

  • Non-preemptive

    • Tasks run until they decide to “give up” (aka yield) control of the processor. They do this by telling the RTOS they no longer want to run. Examples of this occur when a task

      • Wants to delay for some time (ex: osDelay)

      • Wants to wait on an inter-task signal (discussed later, but examples include a Queue or a Semaphore)

  • Preemptive

    • The RTOS can decide to interrupt the execution of a task to run another task, without the currently executing task explicitly indicating it no longer wants to run.

Implications of Concurrency

An RTOS can interrupt a task mid-execution (even if done so non-preemptively, though generally at lesser risk) and run another task by context-switching, making it vulnerable to concurrency-related bugs. Some of the bugs that may result include

  • Race Condition (2 tasks are racing to modify data - explained more in the Mutex section)

  • Deadlock (2 tasks are waiting on each other to give up control of something they are currently holding on to, explained more in the Mutex section)

  • Priority inversion

RTOS Signalling Mechanisms

Mutex

Mutex

Semaphore

Queues and Stream Buffers

Choose an Inter-Task Communication Method in FreeRTOS

 

Using a Fixed Priority Preemptive RTOS

 

 

References and Further Reading