Over the past couple of years, edge AI on microcontrollers (often called “TinyML”) has evolved beyond demos and conference talks. You can now find professional tools, workflows, and design patterns used to ship real products. While this progress has not seen the explosive visibility of other cloud AI paradigms (e.g. large language models), the embedded industry is slowly (and surely) adopting machine learning to solve real-world problems. Machine learning is notoriously resource intensive, and microcontrollers are notoriously resource constrained. As a result, edge AI at the microcontroller scale requires careful planning, engineering, and execution. Looking at edge AI through this lens reveals a field that is maturing steadily, shaped less by headlines and more by practical tradeoffs made by embedded engineers working close to the hardware and with domain experts.
Defining Edge AI on Microcontrollers
The term “edge AI” is broad and can mean different things depending on context. At a high level, it refers to running machine learning models locally on edge devices (i.e. close to where data is generated) rather than sending raw data to the cloud for processing. In practice, this almost always means inference and not training. In other words, we are using pre-trained models to make predictions such as classifications, detections, regressions, or anomaly scores based on sensor data.
I want to focus on a narrow area of edge AI in this article. We are specifically talking about machine learning inference on microcontrollers. These systems typically operate with tens to hundreds of kilobytes of RAM, limited flash storage, no GPU, and often a bare-metal environment or small RTOS. Those constraints heavily influence the tooling, model architectures, and deployment workflows that are viable. By focusing on this end of the spectrum, we can better evaluate the real state of edge AI where resource limits are most apparent and engineering tradeoffs are impossible to ignore.
A Typical Edge AI (on Microcontroller) Workflow
Even though the tools and frameworks vary, most edge AI projects on microcontrollers follow a similar high-level workflow. It usually starts with data collection, ideally in the actual operating environment of the device. Because microcontrollers are so resource constrained, models tend to be narrow and task-specific. General-purpose models with many classes often require too much memory or compute to be practical, which makes representative, domain-specific data especially important.
Next comes model research and training, where the goal is not to maximize accuracy at all costs, but to meet the needs of the application within strict memory, latency, and power constraints. This phase often involves experimenting with small model architectures, feature extraction strategies, and class definitions that reflect what the system actually needs to detect or classify.
Once a candidate model is identified, it must be optimized for deployment. This typically includes quantization, compression, and conversion into a format suitable for the target runtime or hardware. Tooling at this stage bridges the gap between the training environment and the embedded system, translating a model into something that can run efficiently on a microcontroller.
Finally, the model is integrated into a broader embedded application. Inference becomes just one part of the system, alongside sensor drivers, preprocessing, post-processing, application logic, and power management. At this point, traditional embedded engineering concerns, such as timing, memory allocation, robustness, and maintainability, once again become a priority.
Vendor Frameworks and Pipelines
As edge AI has moved from experimentation to production, most silicon vendors have responded by building or acquiring their own frameworks and pipelines for developing and deploying machine learning models on their microcontrollers. This is a natural extension of how the embedded industry has always worked. Vendors already provide SDKs, HALs, middleware, and tooling tightly coupled to their hardware. Edge AI has simply become another layer in that stack.
These vendor frameworks generally aim to do a few things well: integrate cleanly with the vendor’s development environment, take advantage of device-specific accelerators or DSP instructions, and reduce the friction of getting a trained model running on a particular MCU family. The tradeoff, of course, is portability. These tools are often optimized for one ecosystem and less useful outside of it.
Some examples include:
- STMicroelectronics: STM32Cube.AI and NanoEdge AI Studio focus on converting trained models into optimized C code and supporting anomaly detection and classification on STM32 devices.
- NXP Semiconductors: eIQ ML Software Development Environment is designed to span a range of NXP processors while integrating with their SDKs and tooling.
- Renesas Electronics: the Reality AI Tools helps developers build and deploy models while their RUHMI Framework is designed to optimize models for Ethos-based accelerator cores.
- Nordic: Edge AI Add-on for nRF Connect SDK is a runtime library for running optimized ML models on Nordic devices.
While the details differ, the pattern is consistent: silicon vendors are vertically integrating edge AI into their platforms, reducing the gap between model development and deployment on their hardware.
For engineers, this creates a familiar set of tradeoffs. Vendor frameworks can offer excellent performance, strong tooling integration, and early access to hardware features, but they often come with tighter coupling to a specific chip family. In practice, many teams evaluate these tools alongside more open runtimes and platforms, choosing based on performance needs, portability requirements, and long-term product strategy.
Open Inference Runtimes
When we talk about “open” edge AI runtimes for microcontrollers, we usually mean small, embeddable inference engines that are not tied to a single silicon vendor and can be integrated into bare-metal or RTOS-based systems. In 2026, there are only a handful of offerings:
- Google LiteRT for Microcontrollers: Formerly known as “TensorFlow Lite for Microcontrollers” or “TensorFlow Lite Micro,” this runtime remains the most widely used runtime for MCU-class inference. It is designed for bare-metal or RTOS environments and relies on a statically provided memory arena with deterministic, runtime-managed internal allocation to meet tight memory budgets. The runtime is highly portable and can integrate with accelerated operator packages such as CMSIS-NN or proprietary vendor kernels.
- Microsoft ONNX Runtime: ONNX Runtime is a versatile, cross-platform inference engine that excels on edge devices with OS support and hardware acceleration, but it does not offer an official, MCU-class bare-metal runtime like LiteRT for Microcontrollers. While it’s possible to build custom, trimmed ONNX Runtime binaries or use third-party runtimes to run ONNX models in constrained environments, these paths require significant porting effort and are not a drop-in solution for deeply resource-limited microcontrollers.
- Apache microTVM: This was an effort within Apache TVM to extend its compiler and runtime stack to microcontroller-class targets, generating embedded C code for bare-metal and RTOS environments. While it demonstrated the feasibility of compiler-driven ML deployment on MCUs, microTVM no longer seems to be actively supported or present in the current TVM mainline. As a result, it should be viewed as a discontinued or experimental path rather than a production-ready option in 2026.
Open runtimes and vendor-specific tools solve different problems, and the choice between them is usually driven by priorities rather than ideology. Open runtimes emphasize portability, transparency, and long-term flexibility, making them a good fit when teams need to support multiple MCU families, avoid deep vendor lock-in, or iterate quickly across hardware. Vendor tools, by contrast, trade some of that portability for tighter integration and better utilization of device-specific features, such as DSP instructions or on-chip accelerators, which can be critical when power, latency, or memory margins are tight. In practice, many teams start with open runtimes to validate feasibility and workflows, then selectively adopt vendor tooling when optimization or production constraints demand it.
There are also experimental and research-oriented runtimes worth noting, such as small interpreters or compiler bridges, that explore what’s possible on constrained devices. Examples include community ONNX interpreters like cONNXr, efforts like uTensor, and various minimal code generators derived from compiler projects. These efforts aren’t production-ready or widely supported, but they signal active exploration of inference efficiency at the microcontroller level.
End-to-End Platforms
From roughly 2020 through 2024, a number of companies made a strong push into end-to-end model training and deployment for edge devices. These platforms sit one level above runtimes and vendor toolchains; they’re designed to cover the full workflow (data collection and labeling, training, optimization, and deployment) so teams don’t have to stitch together multiple tools to ship an embedded ML feature. In practice, these platforms often deploy using a portable runtime (or a portable export format), but the real value is the pipeline itself: repeatable datasets, faster iteration, and a smoother path from prototype to something that can live inside an embedded application.
Let’s look at a few notable examples:
- Edge Impulse: A widely used end-to-end embedded ML platform that supports many targets. Edge Impulse was acquired by Qualcomm in 2025.
- Neuton.AI: Focused on fully automated TinyML workflows and very small models. Neuton.AI was acquired by Nordic in 2025.
- Imagimob: An end-to-end Edge AI platform for building and deploying models on constrained devices. Imagimob was acquired by Infineon in 2023.
A clear trend over the past few years is that silicon vendors are buying this workflow layer to make edge AI “part of the platform,” rather than an external dependency. Even if these tools continue to offer portable runtimes and broad export options, the most optimized and best-supported paths will increasingly prioritize the owning company’s silicon.
In my own experience, these end-to-end platforms offer excellent learning environments if it’s your first time working with edge AI on resource-constrained devices. Once you understand the fundamentals, however, it’s common to move away from generic platforms and open runtimes toward specialized vendor pipelines to unlock the performance, power efficiency, and tighter hardware integration that embedded developers typically expect.
Silicon Trends and Accelerators
In 2025, progress in edge AI silicon for microcontrollers continued its slow and steady evolution toward targeted applications. Instead of turning MCUs into general-purpose AI processors, vendors focused on selective acceleration: adding DSP extensions, small neural processing blocks, and tighter memory coupling to speed up common ML operations while preserving low power consumption. These accelerators are typically narrow in scope, supporting a constrained set of operators, data types, and memory layouts. When a model fits those constraints, the gains in latency and energy efficiency can be substantial; when it doesn’t, inference still falls back to optimized CPU or DSP execution. As a result, hybrid designs remain the norm rather than the exception.
In 2025, Ethos-U began appearing in higher-end microcontrollers and crossover devices from vendors such as STMicroelectronics, NXP Semiconductors, and Renesas Electronics. It’s important to note that these specialized cores are typically paired with vendor-specific software stacks rather than exposed as a generic accelerator.
At the same time, software-level acceleration has continued to quietly improve, extending the usefulness of “plain” Cortex-M devices. Libraries such as CMSIS-NN and vendor-specific kernels increasingly exploit SIMD instructions and cache-aware memory access patterns, allowing many workloads to run efficiently without dedicated AI hardware.
A few notable examples from 2025:
- STMicroelectronics STM32N6: Introduced as a higher-end MCU with an integrated neural accelerator, the STM32N6 targets vision and ML workloads while maintaining a microcontroller-style programming model. It highlights the trend toward small, task-specific NPUs rather than general AI compute.
- NXP Semiconductors i.MX RT crossover MCUs: NXP continued expanding its crossover lineup, combining high-performance Cortex-M cores with enhanced DSP and ML acceleration. These devices blur the line between MCUs and application processors, enabling more complex edge AI pipelines without moving fully into Linux-class systems.
- Renesas Electronics RA and RZ families: Renesas emphasized tighter coupling between MCUs and optional AI acceleration blocks, along with improved software tooling for ML inference. The focus has been on predictable performance and power efficiency rather than raw throughput.
- Nordic Semiconductor nRF54 series: While not built around large NPUs, newer Nordic devices emphasize efficient DSP execution and tight integration with ultra-small models generated through Neuton-based workflows, reinforcing that not all edge AI gains come from dedicated accelerators.
The takeaway is that edge AI on microcontrollers is being refined rather than revolutionized. New accelerators and standardized blocks like Ethos-U make more workloads practical, but they don’t remove the need for careful system design and hardware-aware tradeoffs.
State-of-the-Art Research in 2026
At the research frontier in 2026, much of the work aimed at improving edge AI efficiency focuses on fundamentally reducing data movement, not just speeding up computation. Neuromorphic processors, which use event-driven, spiking neuron models, continue to demonstrate impressive energy efficiency for specific workloads such as sensory processing and pattern recognition. Platforms like Intel’s Loihi 2 and ABR’s TSP1 show that neuromorphic computing is possible, but it requires entirely different programming models and tooling. While neuromorphic computing has limited adoption beyond research and niche deployments at this time, there is hope it could offer a fundamental shift in AI hardware.
While execute-in-place (XIP) remains a widely used technique in embedded systems, there is relatively little active, standalone research on XIP itself in 2025. That’s largely because XIP is now a mature architectural pattern rather than an open research problem: executing code directly from non-volatile memory is well understood, broadly supported by modern MCUs, and already optimized in commercial systems.
Unlike XIP, compute-in-memory (CIM) is an active area of research, with ongoing work exploring SRAM- and RRAM-based designs to reduce data movement during neural network inference. While most approaches remain experimental and highly constrained, CIM is likely to shape future accelerator and memory architectures for edge AI.
Quantum computing research is actively exploring how quantum algorithms could accelerate or fundamentally reshape machine learning (especially for optimization, large-scale linear algebra, and hybrid quantum-classical approaches) but current quantum hardware remains unsuitable for direct edge deployment. In the long term, breakthroughs in quantum machine learning and hybrid models could influence edge AI indirectly (for example through better model optimization or new algorithms), but practical integration with constrained devices is likely many years away.
Organizations like MLCommons continue to play a role in shaping how the wider AI ecosystem measures and evaluates inference performance across hardware classes, including constrained systems. For example, MLPerf Tiny, a working group under MLCommons, released updated benchmarks in 2025 that evaluate latency, energy, and inference performance for ultra-small neural networks running on low-power systems, with participation from vendors such as Qualcomm, STMicroelectronics, and Syntiant. While much of the broader MLPerf suite focuses on larger models and generative AI workloads, the continued activity around Tiny and edge benchmarking shows that the community still cares about standardized metrics for edge-inference efficiency.
Real Applications
In the past couple of years, edge AI has grown from isolated demos to actual deployed use cases across a range of industries, especially where low-latency, privacy, and power efficiency matter most. Predictive maintenance is one of the most mature applications: embedded inference on vibration, temperature, and acoustic sensor data lets industrial systems detect anomalies and forecast equipment failures on device, reducing downtime and cloud dependency.
Wearables and health monitoring have also embraced edge AI. Smart watches and health trackers use on-device models to monitor vital signs and identify anomalous patterns (e.g. arrhythmias or other physiological deviations) without transmitting sensitive raw data to the cloud, improving privacy and battery life. You can also find AI making its way into medical devices, like stethoscopes and hearing aids.
Another standout domain is agriculture and environmental sensing — tiny models on battery-powered sensors interpret soil moisture, crop health, or pest indicators directly on the node, enabling real-time adjustments without costly connectivity. We’re also seeing some of the corporate giants enter into the space with the promise of edge AI devices, like self-driving tractors. Startups are popping up, as well, to offer autonomous farm equipment and monitoring systems.
Both smart homes and self-driving cars continue to be highly visible applications of edge AI, but in 2025 they remain domains of incremental progress rather than breakout success. In smart homes, edge AI is quietly improving latency, privacy, and reliability (powering features like on-device motion detection, wake-word recognition, and basic activity inference), but these gains largely enhance existing automation rather than redefining it. The lack of a “killer feature” reflects the reality that home environments are messy, user expectations vary widely, and reliability matters more than novelty. As a result, progress shows up as refinement, not revolution.
A Note About the Edge AI Foundation
At the end of 2024, the TinyML Foundation rebranded to the Edge AI Foundation. This move reflects how the field has expanded. What began as a community centered on running extremely small models on microcontrollers has grown to encompass machine learning across a much wider range of edge devices, from deeply embedded sensors to more capable edge processors. The name change signals that while TinyML remains important, edge AI is now understood as a continuum of hardware, workloads, and tooling to include microcontrollers, mobile devices, and other edge hardware. It also brings a broader set of stakeholders (e.g. silicon vendors, platform providers, and researchers) into a shared forum focused on interoperability, best practices, and education.
What to Expect in 2026
Looking ahead to 2026, edge AI on microcontrollers is likely to continue its steady, engineering-driven maturation rather than a sudden breakout moment. Toolchains will improve, documentation will get better, and more silicon will ship with some form of ML acceleration baked in. But the fundamental constraints of limited memory, power budgets, and real-time requirements will still define what’s practical. Most successful deployments will remain highly domain-specific, built around narrowly scoped models that solve one problem well rather than trying to generalize across many tasks.
We should also expect tighter coupling between hardware and software. Silicon vendors will continue to invest in their own ML pipelines, optimized kernels, and reference workflows, often centered around specific accelerators or DSP paths. Open runtimes and end-to-end platforms will still play an important role, especially for education and early prototyping, but production systems will increasingly favor vendor-optimized paths when performance, power, and determinism matter. For engineers, this means fewer “one-size-fits-all” solutions and more emphasis on understanding the hardware beneath the model.
Finally, 2026 is unlikely to be the year that AI “replaces” embedded and domain expertise. In fact, the opposite is true. As edge AI becomes more common, domain knowledge and systems thinking will become even more valuable. Knowing how sensors behave in the real world, how data drifts over time, how models fail, and how to integrate inference into a reliable embedded application will remain hard problems that tooling alone can’t solve. Edge AI will continue to reward engineers who can bridge machine learning with firmware, hardware constraints, and real operating environments.
Conclusion
Edge AI on microcontrollers in 2026 is best understood as a field that’s growing up and evolving rather than demonstrating an explosive breakout. The tools are better, the silicon is more capable, and real products are shipping, but progress continues to be shaped by constraints, tradeoffs, and the realities of embedded systems engineering. Edge AI is raising the bar for what embedded developers need to know, rewarding those who combine machine learning literacy with deep domain expertise and a clear understanding of the hardware. In that sense, the future of edge AI looks less like a sudden revolution and more like a long, steady evolution driven by practical engineering decisions.

I use Edge Impulse in my IoT course.
An excellent summary! Thank you.
That’s great to hear! I find that Edge Impulse is a great teaching tool.