The Wrong Way to Use AI (and How to Actually Write Better Code with LLMs)

Claude 4 refactor codebase meme

This viral post captured a feeling every programmer knows all too well: the thrill of a cleaner architecture, the joy of perfect abstraction, slick implementations that are fast, efficient, and easy to read…

…and the heartbreak when it doesn’t run.

There’s no denying the appeal of letting a large language model refactor a gnarly embedded application. For engineers managing legacy code, dense hardware abstractions, or monolithic projects, having an AI swoop in to modularize the structure feels like salvation. However, we’re not there yet.

The Dream: AI as the Perfect Programmer

I dream of a day when I simply talk to my computer using a natural, human language, and it spits out code that is clean, safe, efficient, follows best practices, and actually works. Every time. I believe we will get there in the next 5 years.

LLMs (or their future evolutions) will eventually become the next layer of abstraction in the human-to-machine translation stack. We started programming computers with switches and punch cards–talking directly to the machine in its native language–before assembers came along. Writing in assembly was much easier than having to think in bits! 

Then we had compilers, and wow, that was a game changer. People could write high-level languages (yes, FORTRAN and C are still considered high-level languages) that could be compiled for a variety of architectures. Writing in FORTRAN, C, etc. was, once again, a paradigm shift in how we worked with these complex electrical beasts. 

Side note: I consider interpreted languages to be on a similar level as compiled languages; they just differ in how they run on the machine. Interpreted languages often have more runtime overhead but higher flexibility. “High-level” here refers to the abstraction rather than runtime performance.

We are witnessing the birth of the next level of abstraction with the rise of LLMs. ChatGPT and Claude can write some amazing programs (that sometimes work) based on human-provided, natural language descriptions. Truly a marvel. But it’s nowhere near perfect (yet).

We will eventually get there: AI will be able to write code better than 99% of programmers, just like compilers can outperform most programmers attempting to write assembly. There will still be some experts out there (just like we still have experts in assembly for various architectures). We will develop ways to efficiently train these algorithms on best practices, safety, etc. Your job, as a programmer in this brave new world, will become one of orchestrator, architect, and debugger. No longer will you be fussing with syntax bugs, missing pointers, or uncaught exceptions.

Understanding LLMs

Understanding how LLMs work is key to understanding why their code looks great but does not always work. Large language models are, first and foremost, BS machines. They are trained to output the highest likely string of words that would appeal to you (the user) based on mountains of reading material (mostly from the Internet).

Open textbooks, free online tutorials, GitHub code and issues, and Stack Overflow (among others) hold all this training data. For the most part, these are expert programmers freely offering advice and helping other people solve programming issues in a variety of languages. LLMs read this material and generate a model of what it thinks people want to hear. In this case, offering helpful solutions for programming-related problems.

See this video for a good explanation on the internal workings of LLMs: https://www.youtube.com/watch?v=5sLYAQS9sWQ 

Most of the popular models do not compile, run, and test the code it gives you. The worst part is that most LLMs are trained to confidently give you answers. It does not know if the code it presents will work or even if it’s “right” (i.e. following best practices). It just makes some predictions about what should come next in a string of words based on stuff it previously read.

Note that there are some projects that are actively addressing this issue, such as OpenHands and AutoGPT. They use LLM actions to compile, run, and test, which closes the feedback loop and helps the LLM automatically debug your programs.

Again, we’re getting there.

The current state of “vibe coding” gets a deservedly bad rap because the programmer does not spend any time understanding the code. They fail to act as an appropriate orchestrator and wonder why the LLM failed.

Your Role as Architect and Orchestrator

Our current state of AI still requires programmers to have an understanding of the language, programming principles, and strong debugging skills. AI is simply a tool in this suite.

Don’t trust anything an LLM gives you. Always verify. Read the code: ensure it makes sense and follows accepted best practices.

With that in mind, you can treat AI as a junior programmer working directly under you. You’ll get much better results if you give them clear descriptions within tightly defined bounds. The smaller the chunk of code you ask it to write, the more likely you’ll get something that works.

This is where your role of architect comes in: it’s your job to define the requirements and interfaces. I found that LLMs will often give some leaky abstraction solutions. Or maybe sometimes you need some leaks for a better experience.

As an orchestrator, you will want to work with multiple LLMs to construct the best possible code. You should always verify their output, and don’t be afraid to get your hands dirty writing some code yourself.

LLMs can act as a force multiplier if used correctly, easily doubling your code output while still maintaining best practices.

My Current LLM Stack

I do not recommend relying on a single LLM or service. In my experience, using multiple LLMs for different tasks yields the best results. As always, you should be verifying what any of these models give you. Here is what I’m currently using:

  • ChatGPT is great for general, overarching questions. For example, “how should I organize a folder structure for a Zephyr application.” It’s also great for brainstorming, e.g., “Help me come up with names for my project.”
  • Claude shines at generating code. You can give it descriptions like “write me a fast fourier transform function in C optimized for an Arm Cortex-M4F.” Again, the narrower the scope, the better.
  • GitHub Copilot saves me tons of time filling out boilerplate code and documentation. It indexes your codebase to search for relevant snippets and attempts to make predictions based on what you are actively typing. This makes it great at both documenting existing functions and writing function setups based on comments. It’s scary how well it knows what I’m trying to do most of the time.

If you have other LLM suggestions, let me know in the comments!

Tips for Using LLMs

Here are some techniques I recommend trying out the next time you use an LLM as your buddy programmer:

  1. Use AI as a strategic assistant, not a magic wand. AI shines at suggesting modular structures, refactoring boilerplate, or summarizing code. But it’s not a replacement for understanding the entire firmware stack. Treat its output like unreviewed pull requests: review, validate, and test.
  2. Break AI input into chunks. Never let AI rewrite 3,000 lines in one go. Ask it to refactor one function or piece at a time. Give it clearly defined goals within the narrowest scope possible.
  3. Break AI output into chunks. Take everything the LLM gives you with a grain of salt and always test it. Modular changes reduce risk, improve traceability, and prevent multi-layered errors from cascading through the codebase.
  4. Make error messages easier to read. Sometimes debugging errors (compile time or runtime) are lengthy, obfuscated, and tricky to understand without a deep understanding of the internal workings of the codebase. Copy and paste the entire error message into ChatGPT or Claude and ask it to explain the message. It’s not always right, but it often sends you in a good direction. This trick has saved me hours of hunting down errors hidden in confusing output messages.
  5. Leverage AI’s patterns even if they’re wrong. The AI’s restructuring often follows solid software engineering principles: separation of concerns, modularization, and DRY practices. Even if the execution fails, the design insight can be valuable and something you can learn from. It’s your job as the orchestrator to rewrite what the AI gives you into usable code.
  6. Document everything. When experimenting with AI-generated code, maintain tight version control. Snapshot working states and annotate commits. This makes it easier to roll back and learn from mistakes. 

AI Is a Mirror

Claude 4’s beautifully broken refactor is a perfect metaphor for engineering in 2025.

Tools are improving rapidly. But they’re only as useful as the engineer wielding them.

When AI outputs a beautifully structured but non-compiling refactor, it’s not a failure, it’s a mirror, revealing where human judgment still matters most.

For developers serious about continuously learning, that mirror is invaluable.

Leave a Reply

Your email address will not be published. Required fields are marked *