ByteGrader: An Open Source, Modular Autograder for Embedded Systems (and Beyond)

Autograders are systems that automatically evaluate and score student programming assignments by running their code against predefined tests, providing instant feedback without human intervention. My latest project, ByteGrader, is a modular, lightweight, open source autograding engine that provides powerful automated grading capabilities through a simple API, allowing educational institutions to integrate autograding into any existing platform or custom application rather than forcing them to adopt an entirely new system.

The Problem with Traditional Autograders

If you’ve ever taught a programming course, you know the pain: hundreds of student submissions, each requiring individual compilation, testing, and grading. Traditional autograding systems like CMU’s Autolab have been workhorses in computer science education for years, but they come with significant limitations:

Monolithic architecture that’s difficult to customize or extend
Complex setup requiring deep systems administration knowledge
Limited flexibility in supporting diverse programming environments
Vendor lock-in with proprietary platforms that control your data and workflow

For educators teaching specialized courses, including embedded systems, IoT development, hardware programming, these limitations become even more pronounced. How do you grade an ESP32 firmware assignment? What about edge AI projects that require specific toolchains? Traditional autograders often fall short.

Enter ByteGrader: Modular, Open, Extensible

ByteGrader is a new approach to automated code assessment, built from the ground up with modularity and extensibility in mind. Developed initially for embedded systems and IoT coursework, it’s designed to handle the diverse, complex requirements of modern technical education. It has been my personal project for the last few months as a way to validate student-submitted, hands-on projects. It was built for my recent course, but the modularity allows it to easily expand to other courses.

What Makes ByteGrader Different?

1. True Containerization Unlike systems that run student code directly on the server, ByteGrader uses Docker containers for every grading operation. Each submission runs in complete isolation with:

Configurable resource limits (memory, CPU, process count)
Custom build environments per assignment
Zero contamination between submissions
Support for any programming language or toolchain

2. Modular Client Architecture ByteGrader separates the grading engine from the user interface entirely. The server provides a clean REST API that can integrate with:

Learning Management Systems (LMS) like LearnDash
Custom web portals
Command-line tools
Mobile applications
Any system that can make HTTP requests

The ByteGrader Client for LearnDash demonstrates this flexibility by seamlessly integrating autograding into WordPress-based courses.

3. Assignment Registry System Instead of hard-coding assignment configurations, ByteGrader uses a YAML-based registry that allows for easy grader configuration. You can see an example of such a registry here.

4. Security by Design

As with all autograders, running arbitrary code submitted by outsiders is a massive security risk. To mitigate potential risks, ByteGrader employs the following security strategies:

IP whitelisting for network security
API key authentication
Rate limiting to prevent abuse
Zip bomb protection and path traversal prevention
Non-root container execution
Time limits for both Python subroutines (to run the submitted code) and for the entire execution container

How ByteGrader Compares

The following table shows how ByteGrader compares to a few existing, popular solutions. The closest competitor is CMU’s Autolab. It is an open source, well-maintained autograder. However, it is a monolithic system that comes with its own frontend and user login systems. ByteGrader, in contrast, is designed to be a modular, API-first system where developers can create their own frontend systems (HTML/JS, WordPress plugin, Moodle plugin, etc.).

Feature	ByteGrader	Autolab	Gradescope	CodeGrade
Architecture	Microservices, API-first	Monolithic Rails app	Proprietary SaaS	Proprietary SaaS
Hosting Model	Self-hosted	Self-hosted	Cloud-only	Cloud-only
Container Support	Native Docker isolation	Limited sandboxing	Container-based	Limited containers
Data Control	Full ownership	Full ownership	Vendor-controlled	Vendor-controlled
License	BSD 3-Clause	Apache 2.0	Proprietary	Proprietary
Setup Complexity	Single container deploy	Complex Ruby/Rails setup	No setup (SaaS)	No setup (SaaS)
Client Flexibility	Any HTTP client	Tightly coupled web UI	Web UI only	Web UI only
Customization	Unlimited modification	Requires core changes	Platform features only	Platform features only
Cost Model	Free (hosting only)	Free (hosting only)	Per-student pricing	Per-student pricing
Programming Languages	Any (via containers)	Instructor-configured	Supported subset	Supported subset
Integration	REST API	Limited API	Platform-specific	Platform-specific
Extensibility	Pluggable graders	Core modifications needed	Not extensible	Not extensible
Vendor Lock-in	None	None	High	High
Security Model	IP whitelist + API keys	Ruby/Rails security	Platform-managed	Platform-managed

The Power of Open Source Modularity

When I started building ByteGrader, I had to make a fundamental choice: build on top of existing systems or start fresh with open source modularity.

I got tired of adapting my teaching to fit the limitations of whatever autograder my platform was using (e.g. Coursera). Sometimes you need to grade assembly code, sometimes embedded firmware, sometimes a machine learning model. Instead of being constrained by an institution’s autograder (which often targets limited high-level languages like C++, Python, JavaScript, etc.), ByteGrader lets you easily create a custom grader for nearly any language.

I also wanted to build something that other developers would actually want to work on, which meant clean APIs, modern technologies, and the ability to target embedded systems. My goal is that other educators and institutions can use ByteGrader as is, contribute features to the core backend, or build their own frontend systems.

Custom Graders

I abstracted away as much of the difficult backend work as possible so that individual graders could be created with minimal effort. The system assumes you are using Python (due to its popularity) to create grading scripts. ByteGrader automatically calls main() and performs the following steps:

File extraction and security validation
Container lifecycle management
Result collection and formatting
Error handling and logging

When writing a grader, you are expected to implement the grade_submission() function and return a custom GradingResult object. Everything between function start and returning results is up to you. That includes compiling, running unit tests, running an emulator (e.g. QEMU) and checking inputs/outputs. Here is a simple grader example (where you would implement run_tests()). The idea is that the educator can focus on writing grading scripts instead of infrastructure logic.

from result import GradingResult

def grade_submission(extracted_path: str) -> GradingResult:

    # Your custom grading logic here
    score, feedback = run_tests(extracted_path)

    return GradingResult(
        score=score,
        max_score=100.0,
        feedback=feedback,
        error=""
    )

Other than Python, the educator (or course developer) would need to know enough Docker and YAML to configure the image to run their script.

Real-World Use Cases

ByteGrader excels at grading assignments that traditional systems struggle with. Due to the containerization of the individual graders, you can have ByteGrader run a wide range of system-level applications (not just single programs!).

Embedded Systems

STM32 firmware projects: Custom containers with ARM toolchains
ESP32 IoT applications: Graders that can validate MQTT communication
Arduino projects: Automated testing of hardware abstraction code

Edge AI and Machine Learning

TensorFlow Lite models: Validate model accuracy and performance
Computer vision pipelines: Test image processing algorithms
Inference optimization: Measure latency and resource usage

Systems Programming

C/C++ projects: Memory leak detection, performance testing
Assembly language: Instruction counting, register usage validation
Operating systems: Kernel module testing, syscall validation

Contributing to the Ecosystem

My hope is the ByteGrader is not just a standalone project for my recent IoT Firmware Development with ESP32 and ESP-IDF course. I would love to see other educators use and extend it in a variety of ways:

Share graders: Contribute assignment templates for your domain
Improve documentation: Help others learn and adopt the system
Report issues: Make the system more robust for everyone
Add features: Extend functionality through pull requests
Join discussions: Share ideas and best practices

Limitations

Because ByteGrader is so new (and mostly a side-project to support my courses), it has a few limitations. My hope is to tackle these in the future as I continue to expand my course offerings.

No user sessions: assignments collected by the main API application are stored in a temporary storage area (Docker volume) and tagged with a unique job ID. A “cleanup” process looks for old (24-48 hours) submitted jobs and removes them. This puts the onus on the frontend clients to remember which job ID is associated with which student, and it has a limited time to receive assignment feedback from the server.

Basic logging: Docker logs are stored to a file, and there are simple /health and /queue endpoints to get some server stats. But that’s it. I would love to add more robust server status, logging, and auditing features in the future.

Limited concurrency: right now, ByteGrader was built for a single, low-volume course. As a result, it relies on a single-server architecture. I plan to expand this to support multiple, concurrent servers with load balancing in the future.

No auto-scaling: as with the previous limitation, ByteGrader cannot automatically scale to support high-demand periods. This feature is on my wishlist for the future.

LMS integration: ByteGrader is a simple backend, designed to be as lightweight as possible (while still running everything in containers). That means the burden of supporting a particular front-end client or learning management system (LMS) is up to the educator or community.

Limited grading: ByteGrader is designed to be an autograder for code assignments first and foremost. There are no plans for manual grading or peer reviewing at the moment.

Conclusion

Traditional autograders serve us well in a variety of computer science courses throughout the world, in both academia and professional development platforms. However, they struggle to meet the demands of many embedded systems and wide architectural, multi-language systems required in many courses. ByteGrader was designed to meet these needs in a modular fashion, allowing educators to extend and adapt the system as required.

If you’re curious about using ByteGrader for your own courses or simply want to tinker with it, check out the getting started guide in the README here: https://github.com/ShawnHymel/bytegrader. ByteGrader is a personal project I developed for my own online courses. If you run into any problems with it or have suggestions, please file an issue on the GitHub repository!