When to Use an RTOS: An Important Decision for Embedded Projects

RTOS representation of microchip with clock overlay

The world of embedded systems seems to be composed of two types of developers: those who love real-time operating systems (RTOSs) and those that despise them, insisting on using bare-metal superloops (plus interrupts) for everything. The reality is that an RTOS is just a tool; there are situations when it’s the right tool, and there are times when a superloop and few interrupts will do the trick.

I’ll introduce the concept of a real-time operating system and then discuss the scenarios in which you might want to consider using one over a simple superloop.

The Core Dilemma: Bare Metal vs. RTOS

At its heart, the decision is a trade-off between control and complexity. A bare-metal approach offers maximum control and minimal overhead, as the application code interacts directly with the microcontroller’s hardware. This is often implemented as a “superloop,” a single, infinite loop that polls inputs and executes functions sequentially. It’s simple, direct, and perfect for straightforward applications. Such single-threaded applications can often be supplemented with software or hardware interrupts to handle event-driven tasks.

Conversely, a real-time operating system introduces a layer of abstraction: a kernel that manages tasks, scheduling, and resources. This provides powerful features for handling concurrent operations but adds a learning curve, a memory footprint, and processing overhead. The question is not which is “better,” but which is appropriate for the specific demands of the project.

Understanding the Fundamentals: What an RTOS Offers

Before making the decision, it’s essential to understand what an RTOS brings to the table and how it differs from other software paradigms. An RTOS is a specialized operating system (OS) designed for applications that must process data and events within strict, predictable time constraints.

Core Characteristics of a Real-Time Operating System

Unlike a general-purpose OS, an RTOS is defined by three primary characteristics:

  1. Determinism: The system’s ability to execute tasks and respond to events within a predictable, bounded timeframe. The key is not raw speed, but consistency.
  2. Responsiveness: The speed at which the OS can acknowledge and begin servicing an external event, often measured by interrupt latency and dispatch latency.
  3. Task Management: A sophisticated scheduler that can manage multiple tasks (threads) concurrently, prioritizing critical operations over less important ones. This is typically achieved through priority scheduling.

RTOS vs. General-Purpose Operating Systems (GPOS)

A GPOS, like Windows or Linux, is optimized for throughput and fairness, aiming to provide a responsive user experience by giving all applications a “fair” share of CPU time. It prioritizes average performance over worst-case execution time. An RTOS does the opposite: it is optimized for predictability. It will always execute a high-priority task over a low-priority one, even if it means starving the lower-priority task, because meeting deadlines is its primary goal.

RTOS vs. Bare Metal: Initial Distinctions

A bare-metal system places the entire burden of scheduling and timing on the application developer. All logic resides in a single executable, often in a superloop. An RTOS provides a formal structure for multitasking. It allows a complex application to be broken down into smaller, independent tasks, each with its own context (stack, registers). The RTOS kernel handles the context switching required to run these tasks, creating the illusion of parallel execution on a single core.

Criterion 1: Project Complexity and Concurrency

As the number of independent functions in an embedded system grows, managing them within a single loop becomes exponentially more difficult. An RTOS provides the tools to manage this complexity gracefully.

Managing Multiple Tasks and Asynchronous Events

Consider a system that must simultaneously monitor a user interface, manage a network connection, control a motor, and read sensor data. In a superloop, this requires complex state machines and carefully crafted non-blocking code to prevent one long-running function from delaying others. An RTOS simplifies this by allowing each of these functions to be implemented as a separate task with varying levels of priority. The scheduler handles the concurrency, ensuring that a high-priority task (e.g., motor control) can preempt a lower-priority one (e.g., UI updates).

Inter-Task Communication (IPC) and Synchronization

When you have multiple tasks, they inevitably need to communicate and share data. An RTOS provides standardized and thread-safe mechanisms for inter-task communication (IPC), such as:

  • Message Queues: For sending data packets between tasks in a first-in, first-out manner.
  • Semaphores: For signaling and synchronizing tasks or controlling access to a shared resource.
  • Mutexes: For ensuring that only one task can access a critical section of code or a shared peripheral at a time, preventing data corruption.

Implementing these mechanisms reliably in a bare-metal system is a significant engineering effort, prone to subtle bugs like race conditions. Using the proven IPC primitives of an RTOS accelerates development and improves reliability.

Criterion 2: Real-Time Requirements & Predictability

Another fundamental reason for choosing a RTOS is the need for deterministic timing. If your system’s correctness depends not only on the logical result of a computation but also on the time at which it is produced, you have a real-time requirement.

Defining “Real-Time”: Hard vs. Soft Real-Time Systems

Understanding the severity of a missed deadline is the first step.

  • Hard real-time systems: A missed deadline constitutes a catastrophic system failure. Examples include anti-lock braking systems in the automotive industry, flight control systems, and critical patient monitors in medical devices. The timing constraints are absolute and must be guaranteed.
  • Soft real-time systems: A missed deadline degrades performance but does not cause system failure. Examples include a video streaming service where a dropped frame is undesirable but not critical, or an IoT sensor network where a slightly delayed data packet can be tolerated.

If your project falls into the hard real-time category, an RTOS is almost certainly necessary to mathematically prove and guarantee that deadlines will be met, assuming that your project is complex enough (according to criterion 1) to necessitate some kind of multitasking environment.

Quantifying Timing Requirements: Latency, Jitter, and Worst-Case Execution

To move beyond qualitative descriptions, you must quantify your system timing needs.

  • Interrupt Latency: The time from when a hardware interrupt is triggered to when the first instruction of its service routine is executed.
  • Dispatch Latency: The time it takes for the RTOS scheduler to stop one task and start executing another, higher-priority task that has become ready to run.
  • Jitter: The variation in the timing of an event or task execution. In many control systems, low jitter is as important as low latency.

These metrics can be verified using tools like a logic analyzer to monitor GPIO pin toggles or by analyzing timing diagrams. Note that a bare-metal superloop’s response time (even with hardware interrupts) is dependent on its current position in the loop, making worst-case analysis difficult. A preemptive RTOS is designed specifically to minimize and bound these latencies.

Criterion 3: Resource Management and Efficiency

An RTOS is not just a scheduler; it’s also a resource manager. It provides services to control access to the system’s memory and CPU time, which is critical for building robust and efficient systems.

Memory Management Considerations

In a bare-metal system, memory management is often static, with global variables and a single stack. An RTOS provides more sophisticated options. Each task has its own stack, isolating its memory space and preventing one task from corrupting another’s stack. Many RTOSs also offer dynamic memory allocation services (e.g., memory pools) that are deterministic and prevent fragmentation, which is a common problem with standard malloc() in long-running embedded systems.

Processor Utilization and Overhead

An RTOS introduces overhead through context switching and kernel operations. However, for complex systems, it can lead to higher overall processor utilization. In a bare-metal polling loop, the CPU is constantly busy checking for events, even when none are occurring. An RTOS allows tasks to block or sleep until an event (like an interrupt or a message) occurs. This allows the CPU to enter low-power modes when idle, which is critical for battery-powered devices, or to be used for lower-priority background tasks. The key is that the RTOS makes CPU usage explicit and manageable.

Criterion 4: Development, Debugging, and Long-Term Maintainability

The choice between bare metal and an RTOS has profound implications for the entire development cycle, from initial coding to long-term maintenance.

Development Cycle and Initial Effort

For very simple projects, a bare-metal approach is faster to set up. There is no RTOS to configure or learn. However, as complexity increases, this initial advantage disappears. The structured nature of an RTOS, with its separation of concerns into distinct tasks, often leads to a faster and more predictable development cycle for complex applications.

Debugging and Testing Efficiency

Debugging timing-related issues in a bare-metal superloop can be incredibly difficult. A bug’s appearance might depend on the exact sequence and timing of external events. RTOS-aware debuggers provide powerful tools, allowing developers to inspect the state of each task, view message queues, and understand resource ownership. This visibility dramatically simplifies the process of finding and fixing complex concurrency bugs.

Code Reusability and Modularity

An RTOS promotes modular design. Because tasks are self-contained and communicate through well-defined APIs (like message queues), they can be developed and tested independently. This makes it easier to reuse components across different projects. Adding a new feature often means adding a new task, with minimal impact on existing, stable code, which can be difficult to accomplish in a monolithic superloop.

Long-Term Maintenance and Team Collaboration

For projects with a long lifespan or those developed by a team, an RTOS is a powerful enabler. The task-based architecture keeps the codebase modular and easier for new developers to understand. It allows team members to work on different tasks in parallel with less risk of interfering with each other’s work, improving overall project velocity and long-term maintainability.

Criterion 5: Safety, Security, and Certification Needs

For many industries, software is subject to rigorous standards for safety and security. The choice of an operating system is a critical component of achieving compliance. The embedded security market is rapidly growing, expected to reach USD 15.06 billion by 2030, highlighting the increasing focus on this area.

Meeting Stringent Industry Standards

Industries like aerospace (DO-178C), industrial automation (IEC 61508), medical (IEC 62304), and automotive (ISO 26262) require extensive documentation and testing to prove system reliability. Many commercial RTOS vendors provide pre-certified kernels and supporting documentation, known as “certification artifacts.” This can save months or even years of effort and cost compared to certifying a proprietary bare-metal scheduler. Commercial options like Integrity by Green Hills Software and embOS by SEGGER are designed with safety certification in mind. Zephyr is also well on its way to achieving safety certifications.

How RTOS Supports Safety and Security

Modern RTOSs offer features that are essential for building safe and secure systems. Memory protection units (MPUs) or memory management units (MMUs) can be used to create isolated memory spaces for each task. If one task becomes corrupted or malicious, the RTOS can prevent it from affecting other critical tasks, enhancing fault tolerance. This task isolation is a cornerstone of building robust systems.

Bare Metal for Safety-Critical Systems?

While it is possible to certify a bare-metal system, the burden of proof is entirely on the developer. Every line of code, including the scheduler logic, must be rigorously tested and documented. For highly complex systems, this can be more challenging than leveraging a pre-certified RTOS component. The simplicity of a bare-metal system can be an advantage in some very small, verifiable applications, but this benefit diminishes rapidly as complexity grows.

Criterion 6: Cost, Licensing, and Hardware Resources

The final set of considerations is the impact on budget and hardware. This involves not only direct costs but also the indirect costs associated with development effort.

Initial Licensing Costs vs. Open-Source Options

The RTOS landscape includes a wide spectrum of licensing models.

  • Commercial RTOS: Options like Azure RTOS (now part of Eclipse ThreadX), Integrity, and embOS often come with licensing fees but provide professional support, extensive documentation, and certification artifacts.
  • Open-Source RTOS: FreeRTOS is one of the most prominent examples, offering a robust, free-to-use kernel with a massive community. Another prominent option is Zephyr, which has gained significant traction due to its scalability, extensive feature set, and strong support from the Linux Foundation. Both FreeRTOS and Zephyr provide extensive documentation and community resources, making them attractive options for projects of varying complexity levels.

Ultimately, the choice depends on the project’s budget, support needs, and certification requirements.

[Edit 2025-09-03]: Azure RTOS was officially given over to the Eclipse Foundation, renamed to “Eclipse ThreadX,” and licensed under the open source MIT license. Thanks to CHOSSAT for pointing this out!

Hidden Costs: Development, Debugging, and Training

The license fee is only part of the total cost of ownership. A significant “hidden” cost is the engineering time required. While an RTOS has a learning curve, it can reduce long-term development and debugging time on complex projects. Conversely, forcing a bare-metal solution onto a complex problem can lead to ballooning costs as developers wrestle with unmanageable code and hard-to-find timing bugs. The team’s prior experience with a particular operating system or lack thereof is a major factor in this calculation.

Conclusion

Choosing between a bare-metal architecture and an ROTS is a critical engineering decision that shapes the future of any embedded project. The right answer is not universal but is found by methodically evaluating your project against a clear set of criteria.

The above criteria encourage you to move beyond a simple feature list and analyze the core requirements of your project:

  • Complexity: How many concurrent tasks and asynchronous events must you manage?
  • rements of your system:
  • Real-Time Needs: Do you have hard or soft real-time constraints? Can you quantify them?
  • Resource Management: Do you need sophisticated memory and CPU management?
  • Development Lifecycle: What are your goals for maintainability, scalability, and team collaboration?
  • Safety & Certification: Does your product operate in a regulated industry requiring formal certification?
  • Cost & Resources: What are the total costs, including licensing, hardware, and engineering effort?

If your project is a simple, single-purpose device with minimal timing constraints, the elegance and efficiency of a bare-metal approach is often the superior choice. However, as your answers to the framework questions point towards increasing complexity, strict timing deadlines, and long-term maintainability, the structured, predictable, and feature-rich environment of an RTOS becomes necessary. Making this decision early in your development cycle can help you successfully deliver that next project or product.

From a personal perspective, I choose simple superloop patterns when the embedded project is intended to tackle only one or two tasks. Once polling or queuing in the superloop becomes too burdensome (and I start missing deadlines), I have to consider adding a simple scheduler; often, FreeRTOS is just the ticket. The alternative is to move up to a faster (and often more expensive) microcontroller.

I’ll look to commercial RTOS options or large, open-source frameworks (e.g. Zephyr) when I need to add complex features, such as networking and graphics/UI. Many full-featured RTOSs have such drivers and libraries already included, which saves tremendous development time. Networking alone is enough to make me consider third-party options, like Zephyr or ESP-IDF. 

I hope this helps you understand the benefits that a real-time operating system brings as well as some of the tradeoffs. Using the right tool for the job is more important than being in the “always RTOS” or “never RTOS” camp.

To learn more about real-time operating systems or dive deeper into ESP-IDF and Zephyr, check out my full course listings: shawnhymel.com/courses

3 thoughts on “When to Use an RTOS: An Important Decision for Embedded Projects

  • And then there are the developers who evaluate whether an RTOS makes sense and, if so, which RTOS is most suitable. 😉

  • Thanks Shawn for this great article! It’s one of the best I read about this topic. Some minor comments. You can put Eclipse ThreadX in the Open-source RTOS category, as it is now MIT license. Some USB, file system, graphics and maybe networking applications do not necessarly require an RTOS. They can work as well in bare metal, provided there is only few tasks to process. Then, combining them likely require an RTOS kernel, like FreeRTOS or any other.

  • Thanks for pointing that out! I made an update to the article regarding ThreadX. I agree that combining USB, graphics, networking, etc. usually requires an RTOS. Sometimes the e.g. networking stack is such a pain that I just take the whole RTOS with it (e.g. ESP-IDF, Zephyr).

Leave a Reply

Your email address will not be published. Required fields are marked *