ByteGrader: An Open Source, Modular Autograder for Embedded Systems (and Beyond)

Autograders are systems that automatically evaluate and score student programming assignments by running their code against predefined tests, providing instant feedback without human intervention. My latest project, ByteGrader, is a modular, lightweight, open source autograding engine that provides powerful automated grading capabilities through a simple API, allowing educational institutions to integrate autograding into any existing platform or custom application rather than forcing them to adopt an entirely new system.

The Problem with Traditional Autograders

If you’ve ever taught a programming course, you know the pain: hundreds of student submissions, each requiring individual compilation, testing, and grading. Traditional autograding systems like CMU’s Autolab have been workhorses in computer science education for years, but they come with significant limitations:

  • Monolithic architecture that’s difficult to customize or extend
  • Complex setup requiring deep systems administration knowledge
  • Limited flexibility in supporting diverse programming environments
  • Vendor lock-in with proprietary platforms that control your data and workflow

For educators teaching specialized courses, including embedded systems, IoT development, hardware programming, these limitations become even more pronounced. How do you grade an ESP32 firmware assignment? What about edge AI projects that require specific toolchains? Traditional autograders often fall short.

Enter ByteGrader: Modular, Open, Extensible

ByteGrader is a new approach to automated code assessment, built from the ground up with modularity and extensibility in mind. Developed initially for embedded systems and IoT coursework, it’s designed to handle the diverse, complex requirements of modern technical education. It has been my personal project for the last few months as a way to validate student-submitted, hands-on projects. It was built for my recent course, but the modularity allows it to easily expand to other courses.

What Makes ByteGrader Different?

1. True Containerization Unlike systems that run student code directly on the server, ByteGrader uses Docker containers for every grading operation. Each submission runs in complete isolation with:

  • Configurable resource limits (memory, CPU, process count)
  • Custom build environments per assignment
  • Zero contamination between submissions
  • Support for any programming language or toolchain

2. Modular Client Architecture ByteGrader separates the grading engine from the user interface entirely. The server provides a clean REST API that can integrate with:

  • Learning Management Systems (LMS) like LearnDash
  • Custom web portals
  • Command-line tools
  • Mobile applications
  • Any system that can make HTTP requests

The ByteGrader Client for LearnDash demonstrates this flexibility by seamlessly integrating autograding into WordPress-based courses.

3. Assignment Registry System Instead of hard-coding assignment configurations, ByteGrader uses a YAML-based registry that allows for easy grader configuration. You can see an example of such a registry here.

4. Security by Design

As with all autograders, running arbitrary code submitted by outsiders is a massive security risk. To mitigate potential risks, ByteGrader employs the following security strategies: 

  • IP whitelisting for network security
  • API key authentication
  • Rate limiting to prevent abuse
  • Zip bomb protection and path traversal prevention
  • Non-root container execution
  • Time limits for both Python subroutines (to run the submitted code) and for the entire execution container

How ByteGrader Compares

The following table shows how ByteGrader compares to a few existing, popular solutions. The closest competitor is CMU’s Autolab. It is an open source, well-maintained autograder. However, it is a monolithic system that comes with its own frontend and user login systems. ByteGrader, in contrast, is designed to be a modular, API-first system where developers can create their own frontend systems (HTML/JS, WordPress plugin, Moodle plugin, etc.).

FeatureByteGraderAutolabGradescopeCodeGrade
ArchitectureMicroservices, API-firstMonolithic Rails appProprietary SaaSProprietary SaaS
Hosting ModelSelf-hostedSelf-hostedCloud-onlyCloud-only
Container SupportNative Docker isolationLimited sandboxingContainer-basedLimited containers
Data ControlFull ownershipFull ownershipVendor-controlledVendor-controlled
LicenseBSD 3-ClauseApache 2.0ProprietaryProprietary
Setup ComplexitySingle container deployComplex Ruby/Rails setupNo setup (SaaS)No setup (SaaS)
Client FlexibilityAny HTTP clientTightly coupled web UIWeb UI onlyWeb UI only
CustomizationUnlimited modificationRequires core changesPlatform features onlyPlatform features only
Cost ModelFree (hosting only)Free (hosting only)Per-student pricingPer-student pricing
Programming LanguagesAny (via containers)Instructor-configuredSupported subsetSupported subset
IntegrationREST APILimited APIPlatform-specificPlatform-specific
ExtensibilityPluggable gradersCore modifications neededNot extensibleNot extensible
Vendor Lock-inNoneNoneHighHigh
Security ModelIP whitelist + API keysRuby/Rails securityPlatform-managedPlatform-managed

The Power of Open Source Modularity

When I started building ByteGrader, I had to make a fundamental choice: build on top of existing systems or start fresh with open source modularity.

I got tired of adapting my teaching to fit the limitations of whatever autograder my platform was using (e.g. Coursera). Sometimes you need to grade assembly code, sometimes embedded firmware, sometimes a machine learning model. Instead of being constrained by an institution’s autograder (which often targets limited high-level languages like C++, Python, JavaScript, etc.), ByteGrader lets you easily create a custom grader for nearly any language.

I also wanted to build something that other developers would actually want to work on, which meant clean APIs, modern technologies, and the ability to target embedded systems. My goal is that other educators and institutions can use ByteGrader as is, contribute features to the core backend, or build their own frontend systems.

Custom Graders

I abstracted away as much of the difficult backend work as possible so that individual graders could be created with minimal effort. The system assumes you are using Python (due to its popularity) to create grading scripts. ByteGrader automatically calls main() and performs the following steps:

  • File extraction and security validation
  • Container lifecycle management
  • Result collection and formatting
  • Error handling and logging

When writing a grader, you are expected to implement the grade_submission() function and return a custom GradingResult object. Everything between function start and returning results is up to you. That includes compiling, running unit tests, running an emulator (e.g. QEMU) and checking inputs/outputs. Here is a simple grader example (where you would implement run_tests()). The idea is that the educator can focus on writing grading scripts instead of infrastructure logic.

from result import GradingResult

def grade_submission(extracted_path: str) -> GradingResult:

    # Your custom grading logic here
    score, feedback = run_tests(extracted_path)

    return GradingResult(
        score=score,
        max_score=100.0,
        feedback=feedback,
        error=""
    )

Other than Python, the educator (or course developer) would need to know enough Docker and YAML to configure the image to run their script.

Real-World Use Cases

ByteGrader excels at grading assignments that traditional systems struggle with. Due to the containerization of the individual graders, you can have ByteGrader run a wide range of system-level applications (not just single programs!).

Embedded Systems

  • STM32 firmware projects: Custom containers with ARM toolchains
  • ESP32 IoT applications: Graders that can validate MQTT communication
  • Arduino projects: Automated testing of hardware abstraction code

Edge AI and Machine Learning

  • TensorFlow Lite models: Validate model accuracy and performance
  • Computer vision pipelines: Test image processing algorithms
  • Inference optimization: Measure latency and resource usage

Systems Programming

  • C/C++ projects: Memory leak detection, performance testing
  • Assembly language: Instruction counting, register usage validation
  • Operating systems: Kernel module testing, syscall validation

Contributing to the Ecosystem

My hope is the ByteGrader is not just a standalone project for my recent IoT Firmware Development with ESP32 and ESP-IDF course. I would love to see other educators use and extend it in a variety of ways:

  • Share graders: Contribute assignment templates for your domain
  • Improve documentation: Help others learn and adopt the system
  • Report issues: Make the system more robust for everyone
  • Add features: Extend functionality through pull requests
  • Join discussions: Share ideas and best practices

Limitations

Because ByteGrader is so new (and mostly a side-project to support my courses), it has a few limitations. My hope is to tackle these in the future as I continue to expand my course offerings.

No user sessions: assignments collected by the main API application are stored in a temporary storage area (Docker volume) and tagged with a unique job ID. A “cleanup” process looks for old (24-48 hours) submitted jobs and removes them. This puts the onus on the frontend clients to remember which job ID is associated with which student, and it has a limited time to receive assignment feedback from the server.

Basic logging: Docker logs are stored to a file, and there are simple /health and /queue endpoints to get some server stats. But that’s it. I would love to add more robust server status, logging, and auditing features in the future.

Limited concurrency: right now, ByteGrader was built for a single, low-volume course. As a result, it relies on a single-server architecture. I plan to expand this to support multiple, concurrent servers with load balancing in the future.

No auto-scaling: as with the previous limitation, ByteGrader cannot automatically scale to support high-demand periods. This feature is on my wishlist for the future.

LMS integration: ByteGrader is a simple backend, designed to be as lightweight as possible (while still running everything in containers). That means the burden of supporting a particular front-end client or learning management system (LMS) is up to the educator or community.

Limited grading: ByteGrader is designed to be an autograder for code assignments first and foremost. There are no plans for manual grading or peer reviewing at the moment.

Conclusion

Traditional autograders serve us well in a variety of computer science courses throughout the world, in both academia and professional development platforms. However, they struggle to meet the demands of many embedded systems and wide architectural, multi-language systems required in many courses. ByteGrader was designed to meet these needs in a modular fashion, allowing educators to extend and adapt the system as required.

If you’re curious about using ByteGrader for your own courses or simply want to tinker with it, check out the getting started guide in the README here: https://github.com/ShawnHymel/bytegrader. ByteGrader is a personal project I developed for my own online courses. If you run into any problems with it or have suggestions, please file an issue on the GitHub repository!

Leave a Reply

Your email address will not be published. Required fields are marked *