I recently gave a talk at work about realistic and practical use of AI as a software engineer. There is an incredible push from leaders in tech companies to drive up AI usage in an attempt to increase productivity from their engineers (atleast that's the stated reason). But reality often doesn't match expectations. In this post, I'm going to discuss the various pitfalls we fall into while comparing the ease of building software in a vibe-coded codebase vs. a large codebase.

Why vibe coding skews expectations

The vast majority of hype built around AI comes from this growing trend of quickly vibe-coded projects that showcase how a complex idea could be implemented quickly in a small teams (often just individuals). From building sexy landing pages, to Slack clones, it is easy to believe that using AI enables us to build large complex pieces of software quickly and correctly. Additionally, CEOs regularly stating that a large percentage of code in their companies are now written by AI adds to this believe.

However, most experienced software engineers know that that isn't actually the case. Our day-to-day tasks are not just about producing code, but even just the writing code pieces haven't become that much better. Why is that?

You need to understand the code you write

When vibe coding a little app, you mainly care about the final product, not any of the implementation details. As long as the end-user experience looks right, that's all that matters. So it's easy to feel satisfied that after a few turns of going back and forth with your favorite Vibe Coding app, it produces a beautiful looking app that you can show off.

But in production engineering, the engineers need to actually understand the code that's written.

This means understanding how the code behaves beyond just the golden paths. When writing code, you are actually making many micro-decisions which lead to you being confident that the code behaves a certain way, not just for golden paths, but also for edge cases. When an LLM is predicting tokens, it's simply doing what's common in its training set, which may often work, but may not.

Speaking of predicting tokens, the training that was involved in the LLMs training set could be out of date by the time you are exercising that particular path.

Mini anecdote: My team has been involved in building the v2 of Teams SDK. I spent some time building llmtxt for this next version to help folks use coding agents to quickly build apps using it. But testing it was fairly difficult because the various agents I used insisted on using the v1 paradigms that are well established. It's just really difficult to go against the LLMs training data.

Anyway, the problem is that, if you are not involved in those decision making processes, it's as if someone else wrote that code. And during a production issue, you are on the hook to know what your code did, and why. It is your responsibility.

You need to write good code

Software Engineers get paid to write and build software that is well architected, extendible, scalable. This skill takes years to learn and master. Writing code isn't hard. But writing good code is hard.

Vibe coded projects do not focus on any of this. They focus on a good demo, and end-functionality for that demo. Without thinking through the code that's been written at somewhat an intimate level, we leave the "architecting" to the coding agent. And if we rely on it too much, we risk falling into the trap of feeling like we have written good code, when really we have just written code that happens to work.

Additionally, LLMs were trained on open-sourced code that was freely available on the internet. And anyone can push up code to Github. Heck, I post a dozen repositories every year (at least), and I would be extremely reluctant to have any of that code run in production. If we extrapolate this, we can generalize and say that the vast majority of the code written by an LLM is probably mediocre. Without us being deeply involved in the decisions that it's making, it opens our applications to be built on similarly mediocre code.

Large code bases are a different beast

Each software project supports a number of features that it supports. Vibe coded projects are generally well scoped, built with some end-goal in mind. They contain some few numbers of features.

Large codebases are generally a different beast. They are built over a longer period of time with hundreds (and often thousands) of features. As software engineers, it is our job to demystify and build complex mental models around these codebases, so we know the best way to approach adding or improving features. The complexity of the mental models is generally correlated to the size of the project.

LLMs are constrained to the size of their context windows, and their "mental" models are limited to whatever is in their context windows. You could make the argument that LLM context window sizes are expanding, so it's easier to put more and more concepts in there.

The problems is that to take a complex piece of software and articulate the exact design structure for the whole thing is not a trivial matter. Agentic software these days suggests things like .md files that get injected into the agent's working memory to help it regain this context whenever it's working on anything. But now you need to articulate the design in this file (using an LLM, sure, but you need to verify that that design is accurate). And you need to continuously maintain this high-level document as well.

So what's next?

This post mainly attempts to highlight that the AI hype that our industry is facing right now does not translate as fluidly to large-scale engineer projects that most engineers typically work on. But, I am not delusional. I know that AI tools are here to stay and can definitely help us be more efficient at our jobs. In the next post, I'm going to document the various strategies I use to actually help me in my day-to-day to help me be more efficient at building software.