Building "software factory", act 1
I don't know about you but I've got the feeling that the AI trends are currently going in the interesting direction of building "software factories". What is a "software factory" in terms of AI?
It's like you apply a waterfall approach (or "wagile - waterfall agile", if you are "agile" :D) to software development using AI. Jokes aside, as this idea is... surprisingly nice!
But first let's talk on what is wrong with just using Claude Code/Gemini/Kiro/Whatever to do the job.
Feel my vibes
The problems with "vibe coding", however you define it, is that usually they lack structure, and from my experience if you put AI in an environment without structure it tends to invent thing and hallucinate.
Also, there is a "context rot" issue, when you get to the limits of your context window and then you need to either compact the context somehow, or just throw out some parts of it just to fit new information.
These two problem are enough to think about improving our toolbox.
Subagents
Before I tell you something of my first approach to "software factory" concept, we need to spend at least two paragraphs on subagents as they're essential for our SF (software factory) to work.
The idea is simple: let one LLM with some predefined prompt and tool call other LLMs with their own prompt and tools.
Why, you may ask? There are a couple good reasons for doing that:
- save the context window - the main agent, let's call it "orchestrator" can be focused on high-level concepts without knowing low-level code, if it needs to make some changes to the code it can call an agent which is specialized in that and has tools to do so.
- specialization - instead of making jack-of-all-trades agent you can do multitude of specific agents with their own prompts, tools, guardrails and permissions
- parallelization - if you work on a big system and implement a big feature that spans across multiple layers you may benefit of having multiple agents, each doing a part of the work, in parallel.
- fresh "memory" - there is some kind of bias in the agents that e.g. both implement the code and tests - from my observations they often do shortcuts just to make tests work instead of fixing the problem in the first place, when you involve multiple agents, each with a "fresh view" on the topic, the chances are that the quality will improve.
- security - giving single agent access to all stuff can be risky... on the other hand, you can create e.g. dedicated agents just for accessing specific part of the system, with it's own guardrails and boundaries that prevents risky actions.
- distribution - when you have your agent working on your machine it is limited to whatever you machine can do, on the other hands, with A2A you can call remote agents, so - let's say you have dev agent and ops agent, one can develop new feature and then call a remote agent that will handle the deployment process whenever it is needed.
Is that all?
Not really, although subagent are quite powerful I found it's not enough to just use subagents to have very autonomous software factory. In the "act 1" I introduced these additional mechanisms:
Task list
Yes, the agent can just use the context as a source of truth for what it needs to do next, but... software factory, as typical factory, works better if you have transparency and insight, so giving an agent a tool for managing it's own task list gives you benefits of both transparency as you can clearly see what the agent is planning to do, as well a a way to "nudge" it if it doesn't do that... and that leads us to...
Task guardian
Sometimes 💩 hits the fan, and that is happening quite often with the agents. An error, or some wrong prompt, or inconclusive results from subagent, and of course the user input needed - all these situations lead to the agent stop, and stop is not something you expect when you want to run autonomous software factory (ok, maybe business change escalation stop is something you want, and I agree that it should be an exception). And of course stops also happen due to technical issues.
So agent stops. Then my "task guardian" works like this: it checks a task list (see above) for any "to do" or "in progress" items, check if there are subagents running, as if they do we just wait for them to finish, and then... nudges each agent with predefined prompt with the ask to resume working on non completed items.
The results? No agent is just stopped, and even if it is, the task guardian resumes it work so you are sure the work is not stuck.
One important remark here: where are situations when you may want the agent to stop - for example if the result of agent would change business assumptions of the system. For that specific purpose it is necessary to provide the agent a way to signalize that - in my case I just have "blocked" status for each task in the task list with a prompt that whenever the agent hits the wall and needs human intervention it should mark task as blocked. This is not the final way on how it should work, but for "act 1" it is enough.
Subagent notifications
The next mechanism I suggest to implement is more of the optimization than something you have to have in your software factory.
But first, some introduction on the possible ways you may invoke subagents:
- handoff - where you switch your conversation and give total control to the agent you invoke, in some scenarios that includes sharing the context and message history, I won't discuss that in the details as this is not something I consider a "way to go".
- agents as a tools - in this scenario you provide invocation of the subagent just as it was the ordinary tool, it's simple, efficient, the agent needs to provide explicit context and the task for subagent but it works, until you need to wait for the subagent - in that way you have two options: "sync" call when you somehow wait for the subagent to finish and then return to the main thread or: "async" way in which you provide a way to both invoke the subagent in "fire and forget" manner as well as the tools for listing an getting results from the subagents.
The second options works fine but... by default it seems that LLMs are not eager to give up the control and just finish the turn. They have a tendency to invoke get results tools in the loop which is not fine as every loop costs $$$.
My take on that was to implement a platform-level mechanism that whenever the subagent finishes it sends the result to the parent conversation. This, and also asycn invocation tool and proper prompt to "just invoke all agents you need and end turn as the platform will inform you when needed" seems to work quite well and also save a lot of money.
Subagents roles
And this is final thing I implemented in my software factory. Why it matters?
The idea is that in my setup some agents have access to the code and some don't so they are not tempted to implement everything on their own.
This is something I'm still evaluating, and it's an obvious material for "act 2", but the division to a separate seems to be quite important for the quality of the results.
But that's, as I said, is a material for "act 2" :)