Over the past year, the software industry has been undergoing the largest change I've seen since the dot-com bubble of the 1990s. Companies from large to small are adopting so-called "agentic" software design. Sometimes called "vibecoding", this is when someone describes a desired piece of software to an AI large language model, such as Anthropic's Claude or ChatGPT, and then asks the language model to build the application for them. A large number of "coding agents", which are tools designed to facilitate this process and make it easier, have become mainstream over the past year. The one I have been using the most is Claude Code by Anthropic.
In my previous post I showed how to design a simple database system. My primary reasons for doing so were to explore the choices that database designers make when they implement a database system and to provide a database system that I could use in retrocomputing environments. However, a third goal was to leverage Claude Code to implement the database library and in doing so provide an example of how such a library can be implemented using current best practices in agentic development.
There has been a lot of discussion around "vibecoding" and the fall of software engineering as a profession, but I hope to show in this post the importance that software engineering as a discipline still plays in agentic software development. It is easy for an AI agent to create a proof of concept that more or less works properly. It's a very different matter to help the AI build something that is robust, well-tested, extensible, and easy to maintain. In a very real sense, the AI agent is like a junior programmer who needs guidance from an experienced engineer or architect to help them produce high quality software systems.
Claude Code and the Superpowers Plugin
My AI coding agent of choice is Claude Code. If you've never used it before and have experience writing your own software, it can take a bit of getting used to. Although Claude Code can run inside your development environment (IDE), it is more common to simply run Claude Code from the command line directly inside your project directory. You are then given a prompt and you simply converse with Claude using natural language. Claude Code has access to your filesystem and shell, and can read files, write code, run builds, and fix errors on its own (though you have to give it permission to do so first). It is like pair programming with someone who can type very quickly.
Cluade Code, and most agentic coding frameworks, support additional plugins to enhance their functionality. These generally take the form of skills or tools. Skills are text files that explain how the AI should perform certain tasks and tools are external tools, either servers accessed remotely or programs executed locally, that give the agent the ability to do things. For example, you might install the GitHub command line tool on your system and then write a skill file that tells Claude how to use it to find repositories, check them out, and create pull requests. To make it easier to distribute these skills and tools, Anthropic has added a "plugin marketplace" to Claude Code where people can offer their plugins for others to use.
When building vDB I made use of one such plugin called "superpowers". This plugin adds structured workflows and encourages the AI to do more planning before it codes. It encourages the AI to brainstorm approaches, write implementation plans, and create task lists before touching source files. This makes the whole process more predictable and easier to steer, which matters a lot when you are handing off a nontrivial implementation task.
Another feature of agentic devlopment that I made use of are so-called "sub-agents". A sub agent is a separate instance of the AI that runs with its own context and instructions. Each agent is given instructions that make it behave in a specific way. Agents are used so that the main AI conversation doesn't get overloaded with context that it shouldn't care about. For example, if I have several libraries to write I might spin off a separate agent to write each library. The full context of how those libraries are written stays within those agents so that my main AI doesn't get overwelmed with all those details. Instead, the agents simply report back when they're done. It can take a bit of getting used to in order to wrap your head around how this works, but there are certain use cases where agents make a lot of sense.
The Input Documents
I prepared five specification documents before starting. The first of these was db.md, which was the outcome of my previous post and describes the detailed design of the database. I also created btree.md, which describes the design of the B-Tree library in more detail. To these I added a files called rules.md that provides a list of rules that should be followed during creating of the libraries. These are things like naming conventions, filename formats, C language compliance details, etc. To verify that the library works properly after it is built, I created a specification for a simple user management application, useradm.md. Finally I created a list of the overall goals of the project and put that list in goals.md. These specifications are the type of thing one would expect to come out of a design review process before beginning to code.
Creating the Sub-Agents
To create the sub-agents I first wrote basic descriptions of the two agents I wanted to use: a software engineer and a test (QA) engineer. Although these files were relatively short, I included details about how each of the two agents was supposed to behave. I then had Claude take those two descriptions and create fully fleshed out agent configuration files from them. Each file specifies a name, a description, a list of tools the agent is allowed to use, and a system prompt that defines the agent's persona and instructions. When you dispatch work to one of these agents, Claude Code spawns a separate process that operates within that persona's constraints. The agent does its work and then returns its results to the main conversation.
The first agent was called sw-eng (software engineer). Its system prompt described it as an experienced ANSI C developer who had worked across UNIX, DOS, Classic Mac OS, Windows 3.1, and other platforms. I gave it access to Read, Write, Edit, Bash, Grep, and Glob tools so it could explore the codebase, write code, and run builds. The project's coding rules were embedded directly in the prompt so it wouldn't need to re-read them on every dispatch. It was also told to follow a test-first workflow: write failing tests, then implement until they pass.
The second agent was called qa-eng (QA engineer). Its prompt described it as a code reviewer whose job was to evaluate implementations against the project's rules and look for problems. I gave it a checklist covering test coverage, security (buffer overruns, unchecked array accesses, integer overflows), memory safety (leaks, use-after-free, uninitialized reads), and rules compliance (C89 conformance, naming conventions, 8.3 filenames). The important thing about qa-eng was that it only had read-only tools: Read, Grep, Glob, and Bash. It could not edit files. If it found problems, it had to report them and wait for the sw-eng agent to make the fixes.
This tool restriction was deliberate. When a single agent writes code and then reviews its own output, it tends to be lenient. It remembers what it intended and is less likely to catch the assumptions baked into the implementation. A separate reviewer that starts with a fresh context and evaluates the code as it exists on disk is much more likely to find real issues, for the same reason that human code review works: fresh eyes catch what familiar eyes miss. Limiting the QA agent to read-only tools also meant that every fix had to go through the sw-eng agent, creating an explicit review-and-fix cycle with a clear paper trail.
The Workflow
Once the agents were created, I asked Claude to read all the specs, create a master implementation plan, and then execute it by delegating the actual work to the two agents. The plan Claude came up with was straightforward: implement each of the six goals in dependency order, and for each goal, follow this cycle:
- Dispatch sw-eng with the goal description and relevant specs.
- Run
make clean && make testto verify the build. - Dispatch qa-eng to review the new code.
- If QA found issues, send the issue list back to sw-eng for fixes.
- After fixes, rebuild and send back to qa-eng for re-review.
- Once QA reported PASS, move on to the next goal.
Claude was acting like a tech lead who delegates work to specialists and verifies their results. The actual coding, testing, and reviewing were all done by the agents.
The first goal was the test framework. Claude dispatched sw-eng with the requirements (TestInit, TestAdd, TestRun, assertion macros, no dynamic allocation, self-tests) and it came back in 58 seconds with four files. Claude ran the build and 8 tests passed with zero warnings. Then Claude dispatched qa-eng, which found five issues including missing edge case tests and a potential buffer concern. Claude sent the list back to sw-eng, which fixed everything in about a minute. After the fix, 15 tests passed and qa-eng reported PASS. The whole cycle took about three minutes.
Goals 2 and 3 (the utility library and CRC/hashing library) followed the same pattern and went smoothly. QA passed on both without requiring fixes, though it did note some minor non-blocking recommendations on the hashing library.
Goal 4, the B-Tree library, was where things got interesting. The sw-eng agent implemented the B-Tree in about three minutes: 512-byte pages, little-endian serialization, insert/find/delete operations, overflow pages, and string key hashing via CRC-16. All 79 tests passed. But when Claude dispatched qa-eng, it found nine issues, two of which it flagged as security bugs. The first was insufficient bounds checking on key counts read from disk. A malformed index file could cause an out-of-bounds array access. The second was missing validation on overflow page chains, which could lead to infinite loops if the data was corrupted. These are exactly the kind of bugs that a human code reviewer would flag, and they are the kind that tend to ship to production when there is no dedicated reviewer. The sw-eng agent fixed all nine issues, added bounds checks and validation, wrote three new test cases, and qa-eng gave it a PASS on re-review.
Goal 5, the database library, was the largest component. It required 11 source files and covered everything from header page management to record CRUD to journaling and crash recovery. The sw-eng dispatch took over 10 minutes (compared to 1-3 minutes for earlier goals), partly because it had to read through the existing codebase to understand the libraries it was building on top of. QA found five issues, sw-eng fixed them, and all 141 tests passed.
Goal 6, the example user administration program, brought the total to 35 files and 159 tests across 12 test suites. QA found four more issues (mostly around input validation and edge cases), sw-eng fixed them, and the final re-review passed.
The whole session, from first prompt to final QA pass, took 51 minutes. In less than an hour and with no human intervention I had a working database library that match the design I had created.
Conclusion
The thing that surprised me most was how effective the two-agent pattern was at catching real bugs. It was one thing for the QA agent to flag style issues and missing tests, but the security bugs it found in the B-Tree implementation were genuine problems with real consequences. If this code were handling untrusted data (and a database that stores user records very well might be), those missing bounds checks could be exploitable. Having a separate agent that evaluates the code from scratch, without the context of having written it, creates a kind of adversarial pressure that I think is genuinely valuable.
The other thing that stood out was the importance of good specs. The agents were effective because the specifications were detailed and unambiguous. The rules were embedded in the agent prompts, which meant C89 compliance was the default rather than something to remember. Variables were declared at the top of functions, only C-style comments were used, all filenames were 8.3 compatible, and the naming conventions were correct throughout. I suspect that if I had given the agents vague requirements, the result would have been proportionally vague code. The time I spent writing specs before interacting with the agent was probably the most valuable part of the whole process.
That said, there were some rough edges. After every sw-eng dispatch, the IDE's clang analyzer would report dozens of false positive errors because it didn't know about the -Iinclude flag. The actual gcc build was clean every time, but the noise was annoying and Claude had to repeatedly explain to me that the diagnostics could be ignored. Each agent dispatch also starts completely fresh, with no memory of previous dispatches. This meant that when the sw-eng agent was working on Goal 5 (the database library), it had to spend time re-reading code that it had written in earlier dispatches. For a larger project this could become a real bottleneck.
I also want to be honest about the limits of what the agents tested. The useradm program is a menu-driven interactive CLI, and while the agents wrote thorough unit tests for the underlying functions (serialization, search, record creation, etc.), the AI didn't actually run the program and step through the menus. Automated testing of interactive programs is hard, and this was a gap in the verification process that a human tester would need to fill.
Finally, Claude chose the goal ordering, verified builds manually, decided which QA findings were worth sending back for fixes, and made judgment calls about when to move on to the next goal. The agents were powerful tools, but they needed someone directing the work. I think this approach works best for projects that have clear specs, a well-defined dependency order, and components that can be built and tested incrementally. A project with heavy integration concerns or requirements that evolve during development would be harder to drive this way.
The full conversation transcript is available if you want to read through it, and the code Claude generated can be found in the Github repo for libvdb.