
Autograders are systems that automatically evaluate and score student programming assignments by running their code against predefined tests, providing instant feedback without human intervention. My latest project, ByteGrader, is a modular, lightweight, open source autograding engine that provides powerful automated grading capabilities through a simple API, allowing educational institutions to integrate autograding into any existing platform or custom application rather than forcing them to adopt an entirely new system.
The Problem with Traditional Autograders
If you’ve ever taught a programming course, you know the pain: hundreds of student submissions, each requiring individual compilation, testing, and grading. Traditional autograding systems like CMU’s Autolab have been workhorses in computer science education for years, but they come with significant limitations:
- Monolithic architecture that’s difficult to customize or extend
- Complex setup requiring deep systems administration knowledge
- Limited flexibility in supporting diverse programming environments
- Vendor lock-in with proprietary platforms that control your data and workflow
For educators teaching specialized courses, including embedded systems, IoT development, hardware programming, these limitations become even more pronounced. How do you grade an ESP32 firmware assignment? What about edge AI projects that require specific toolchains? Traditional autograders often fall short.
Enter ByteGrader: Modular, Open, Extensible
ByteGrader is a new approach to automated code assessment, built from the ground up with modularity and extensibility in mind. Developed initially for embedded systems and IoT coursework, it’s designed to handle the diverse, complex requirements of modern technical education. It has been my personal project for the last few months as a way to validate student-submitted, hands-on projects. It was built for my recent course, but the modularity allows it to easily expand to other courses.
What Makes ByteGrader Different?
1. True Containerization Unlike systems that run student code directly on the server, ByteGrader uses Docker containers for every grading operation. Each submission runs in complete isolation with:
- Configurable resource limits (memory, CPU, process count)
- Custom build environments per assignment
- Zero contamination between submissions
- Support for any programming language or toolchain
2. Modular Client Architecture ByteGrader separates the grading engine from the user interface entirely. The server provides a clean REST API that can integrate with:
- Learning Management Systems (LMS) like LearnDash
- Custom web portals
- Command-line tools
- Mobile applications
- Any system that can make HTTP requests
The ByteGrader Client for LearnDash demonstrates this flexibility by seamlessly integrating autograding into WordPress-based courses.
3. Assignment Registry System Instead of hard-coding assignment configurations, ByteGrader uses a YAML-based registry that allows for easy grader configuration. You can see an example of such a registry here.
4. Security by Design
As with all autograders, running arbitrary code submitted by outsiders is a massive security risk. To mitigate potential risks, ByteGrader employs the following security strategies:
- IP whitelisting for network security
- API key authentication
- Rate limiting to prevent abuse
- Zip bomb protection and path traversal prevention
- Non-root container execution
- Time limits for both Python subroutines (to run the submitted code) and for the entire execution container
How ByteGrader Compares
The following table shows how ByteGrader compares to a few existing, popular solutions. The closest competitor is CMU’s Autolab. It is an open source, well-maintained autograder. However, it is a monolithic system that comes with its own frontend and user login systems. ByteGrader, in contrast, is designed to be a modular, API-first system where developers can create their own frontend systems (HTML/JS, WordPress plugin, Moodle plugin, etc.).
Feature | ByteGrader | Autolab | Gradescope | CodeGrade |
Architecture | Microservices, API-first | Monolithic Rails app | Proprietary SaaS | Proprietary SaaS |
Hosting Model | Self-hosted | Self-hosted | Cloud-only | Cloud-only |
Container Support | Native Docker isolation | Limited sandboxing | Container-based | Limited containers |
Data Control | Full ownership | Full ownership | Vendor-controlled | Vendor-controlled |
License | BSD 3-Clause | Apache 2.0 | Proprietary | Proprietary |
Setup Complexity | Single container deploy | Complex Ruby/Rails setup | No setup (SaaS) | No setup (SaaS) |
Client Flexibility | Any HTTP client | Tightly coupled web UI | Web UI only | Web UI only |
Customization | Unlimited modification | Requires core changes | Platform features only | Platform features only |
Cost Model | Free (hosting only) | Free (hosting only) | Per-student pricing | Per-student pricing |
Programming Languages | Any (via containers) | Instructor-configured | Supported subset | Supported subset |
Integration | REST API | Limited API | Platform-specific | Platform-specific |
Extensibility | Pluggable graders | Core modifications needed | Not extensible | Not extensible |
Vendor Lock-in | None | None | High | High |
Security Model | IP whitelist + API keys | Ruby/Rails security | Platform-managed | Platform-managed |
The Power of Open Source Modularity
When I started building ByteGrader, I had to make a fundamental choice: build on top of existing systems or start fresh with open source modularity.
I got tired of adapting my teaching to fit the limitations of whatever autograder my platform was using (e.g. Coursera). Sometimes you need to grade assembly code, sometimes embedded firmware, sometimes a machine learning model. Instead of being constrained by an institution’s autograder (which often targets limited high-level languages like C++, Python, JavaScript, etc.), ByteGrader lets you easily create a custom grader for nearly any language.
I also wanted to build something that other developers would actually want to work on, which meant clean APIs, modern technologies, and the ability to target embedded systems. My goal is that other educators and institutions can use ByteGrader as is, contribute features to the core backend, or build their own frontend systems.
Custom Graders
I abstracted away as much of the difficult backend work as possible so that individual graders could be created with minimal effort. The system assumes you are using Python (due to its popularity) to create grading scripts. ByteGrader automatically calls main() and performs the following steps:
- File extraction and security validation
- Container lifecycle management
- Result collection and formatting
- Error handling and logging
When writing a grader, you are expected to implement the grade_submission() function and return a custom GradingResult object. Everything between function start and returning results is up to you. That includes compiling, running unit tests, running an emulator (e.g. QEMU) and checking inputs/outputs. Here is a simple grader example (where you would implement run_tests()). The idea is that the educator can focus on writing grading scripts instead of infrastructure logic.
from result import GradingResult
def grade_submission(extracted_path: str) -> GradingResult:
# Your custom grading logic here
score, feedback = run_tests(extracted_path)
return GradingResult(
score=score,
max_score=100.0,
feedback=feedback,
error=""
)
Other than Python, the educator (or course developer) would need to know enough Docker and YAML to configure the image to run their script.
Real-World Use Cases
ByteGrader excels at grading assignments that traditional systems struggle with. Due to the containerization of the individual graders, you can have ByteGrader run a wide range of system-level applications (not just single programs!).
Embedded Systems
- STM32 firmware projects: Custom containers with ARM toolchains
- ESP32 IoT applications: Graders that can validate MQTT communication
- Arduino projects: Automated testing of hardware abstraction code
Edge AI and Machine Learning
- TensorFlow Lite models: Validate model accuracy and performance
- Computer vision pipelines: Test image processing algorithms
- Inference optimization: Measure latency and resource usage
Systems Programming
- C/C++ projects: Memory leak detection, performance testing
- Assembly language: Instruction counting, register usage validation
- Operating systems: Kernel module testing, syscall validation
Contributing to the Ecosystem
My hope is the ByteGrader is not just a standalone project for my recent IoT Firmware Development with ESP32 and ESP-IDF course. I would love to see other educators use and extend it in a variety of ways:
- Share graders: Contribute assignment templates for your domain
- Improve documentation: Help others learn and adopt the system
- Report issues: Make the system more robust for everyone
- Add features: Extend functionality through pull requests
- Join discussions: Share ideas and best practices
Limitations
Because ByteGrader is so new (and mostly a side-project to support my courses), it has a few limitations. My hope is to tackle these in the future as I continue to expand my course offerings.
No user sessions: assignments collected by the main API application are stored in a temporary storage area (Docker volume) and tagged with a unique job ID. A “cleanup” process looks for old (24-48 hours) submitted jobs and removes them. This puts the onus on the frontend clients to remember which job ID is associated with which student, and it has a limited time to receive assignment feedback from the server.
Basic logging: Docker logs are stored to a file, and there are simple /health and /queue endpoints to get some server stats. But that’s it. I would love to add more robust server status, logging, and auditing features in the future.
Limited concurrency: right now, ByteGrader was built for a single, low-volume course. As a result, it relies on a single-server architecture. I plan to expand this to support multiple, concurrent servers with load balancing in the future.
No auto-scaling: as with the previous limitation, ByteGrader cannot automatically scale to support high-demand periods. This feature is on my wishlist for the future.
LMS integration: ByteGrader is a simple backend, designed to be as lightweight as possible (while still running everything in containers). That means the burden of supporting a particular front-end client or learning management system (LMS) is up to the educator or community.
Limited grading: ByteGrader is designed to be an autograder for code assignments first and foremost. There are no plans for manual grading or peer reviewing at the moment.
Conclusion
Traditional autograders serve us well in a variety of computer science courses throughout the world, in both academia and professional development platforms. However, they struggle to meet the demands of many embedded systems and wide architectural, multi-language systems required in many courses. ByteGrader was designed to meet these needs in a modular fashion, allowing educators to extend and adapt the system as required.
If you’re curious about using ByteGrader for your own courses or simply want to tinker with it, check out the getting started guide in the README here: https://github.com/ShawnHymel/bytegrader. ByteGrader is a personal project I developed for my own online courses. If you run into any problems with it or have suggestions, please file an issue on the GitHub repository!