Agent Building Trends [Operator Bonus Episode]
2026-04-18 20:00:00 • 10:47
In this operator's bonus episode, we are talking about the agents that people are building,
the challenges they're running into, and what it teaches us about the full breadth of
agent use cases. The AI Daily Brief is a daily podcast and video about the most important
news and discussions in AI.
Alright friends, happy weekend. We have a quick little operator's bonus episode for you today.
As you know, for the last few weeks, I've been running this agent madness experiment.
I love a good bracket, March madness is fun, and I thought it'd be a cool way to show off
the interesting agents people are building. The big theme of 2026 is of course that agents
are officially real, and you, yes you, my friends can build them yourselves, and agent madness
is way less about the competition aspect and more just about a fun way outside of just a
gallery to show off what people are cooking up. We are now as of the time of this recording
in the Elite 8, but I wanted to zoom out even more broadly than that to talk about some of the
patterns that we saw. We had about 100 submissions and it was overwhelmingly solo builders, they
represented about 71% of the field. That said, among the projects that were accepted,
teams had an 87% acceptance rate versus 51% for solos. Now to give you a sense of how acceptance
actually worked, I wanted absolutely nothing to do with judging people's projects, so I had Opus
4.6 and GPT 5.4 to bait, give each project a score on a number of different dimensions,
and then effectively use those top 64 ranks to build out the bracket. I didn't actually have to
step in at all, so this is all an AI judge thing, so if your project didn't get in, your beef
is with the model labs. Unsurprisingly, the products that were live got in at a much higher rate
about twice as frequently as the companies that were still at the prototype stage, and one interesting
little note about 20% of the projects came from companies that said that they were entirely AI run.
In terms of observations, one really interesting thing is that people are not building themselves
tools. They are building themselves digital employees and org charts. Some are explicitly employees,
for example, Harold called itself an AI chief of staff, DiamondDousen.ai had Atlas as CEO, Nova
running engineering and Blaze running marketing, and know those aren't just people with really cool
parents, those are the names of the agents, the fleet runs seven agents with the chief of staff
orchestrator, and Myz has employee IDs for its agents, and even a three strike termination policy,
where one of the agents was fired for fabricating business logic. So in a very short amount of time,
you've gone from AI assistant to AI employee to AI org chart, and it's very clear that a big
strand of experimentation right now is not can AI do work, but what's the minimum level of human
involvement? Now for what it's worth, I don't think this is where things are going to land, I think
that it's very natural that we're in a phase where we're going to the absolute extremes to see
what's possible. This is of course the story of Polsia that we've covered on here before as well.
I don't really think the idea is that the optimal number of humans to be involved in a company
is zero or one. I think it's that by removing humans, you can see where the current coordination
and capability set starts to break down. Now if the org chart stuff was a really persistent theme
across the projects, many of the most emotionally resonant submissions pointed somewhere different.
These are products that I think you could see as markets of one. In other words,
there are problems that you wouldn't necessarily expect companies to build for because they're
so specific and discrete to the person who built them. And of course, this is where you see the
payoff of the changing cost of production of software. So a couple of examples from this pool,
someone with episodic graves disease gave Claude nine years of Apple Health data, and their
detector now catches thyroid flares two or three weeks early. A non-technical ADHD mom built life
coach OS, an Arkansas kayaker built creek intelligence, which predicts when rain fed whitewater
creeks are runnable, and a parent built a toddler behavior chart rendering as an exploding universe
called Jude stars. In terms of challenges people ran into, there is one clear infrastructure gap
that the whole field is screaming about and that is memory. A meaningful number of the submissions
are effectively elaborate workarounds for agents forgetting everything between sessions.
Myes uses 50 plus markdown brain files, sign up reported that their agents kept forgetting what
each other were working on, carrier file is literally a text file you paste into any AI to help with
context, open brain shares one MCP memory server across Claude code cursor and windsurf. All of these
hacks markdown files, knowledge graphs, vector DBs, copy paste text is kind of the diagnosis of the
big problem facing the agent ecosystem, which is the memory problem. Now in terms of who is building,
the median builder here is probably not who you'd guess. Partially that is of course because of
the wide nature of this audience. Partially it's because agent madness might have represented a
different type of opportunity that non-technical builders might not usually have had. Still,
we have paramedics, glaciologists, kayakers, restaurant operators, sales leaders,
people who are domain experts and can now use software to do things that they've always wanted to
do or solve problems that were never possible to solve before. The story of agentic coding,
as much as it is about changes in how software gets built, is actually more in my estimation
about changes in what software gets built for and who builds it. Now one really interesting
pattern that showed up is the idea of argument as architecture. Basically multi-agent debate is
showing up as an actual architectural pattern. In some cases, builders figured out that a single
LLM call was either unreliable or incomplete, rather than adding more retrieval they made agents
argue. One example of this is wiki-tax.ai which runs autonomous tax debates three times a day.
Part of what I think is interesting about this is that this is also how the bracket itself was
constructed. I had these two models debate to give scores and if you look on a particular matchup,
you can see a write-up of the models debate and who they think should win between the two.
By the way, if you want to make up your opinion completely outside of AI, what the AI thinks is
hidden by default but you can unlock it anytime you want. I think that this idea of argument as
architecture is a really interesting one though and a pattern that I'm certainly finding myself
attracted to. One other really interesting pattern that I think maybe hurls where we're going
is that there was a lot of physical world crossover. So for example, brainjam used EEG and FNIRS
brain signals to make an AI musical co-performer that adapts to cortical blood flow. HWAgent writes
and uploads firmware to our Dweenos from plain language and creak intelligence runs on Raspberry
Pi's parsing NOAA radar data in the field. TLDR people are definitely not just building
digital realm software they are thinking about the full integration of the physical world as well.
Now the defining challenge across all of this is that while the current state of tools
has unlocked things that were never possible before, especially for this set of builders,
there still is a huge gap between their average level of ambition and the infrastructure holding
it together. If we did this again next year, I think the types of things that people would be able
to build and which problems they would focus on would likely look significantly different
based just on how many of them are workarounds for the current problems of the agentic build space.
Now like I said, we are in the Elite 8, so I wanted to do a quick preview of these projects.
In Region 1, we have WikiTax AI versus Jekard. WikiTax, you heard me just talk about a minute
ago, but it describes itself as a fully autonomous multi-agent platform where AI tax specialist debate
with no humans in the loop. While Jekard is a multi-agent workspace operating system where
clawed Gemini and open code run autonomous scrum integrations, finding bugs, writing tests,
fixing code, and deploying to production with zero human intervention. So in both cases,
we have a real experiment around no humans in the loop and no human involvement,
but obviously very different outputs. One is applying AI to software engineering,
the other is applying it to a specific domain. Over in Region 2, we have WikiTax versus the family
claw. WikiTax says Web Search gives AI the internet, WikiTax gives AI the market. WikiTax helps
AI create a market intelligence layer between your data and your enterprise AI stack,
conditioning your surveys, engagements in market research and destruction intelligence,
your models can query, reason over, and act on. Effectively, it's a type of market data tool.
The family claw describes itself as a family of AI agents that talk to each other,
make phone calls, handle shopping and payments, and keep a household running. Now this is a theme
a lot of people have been talking about recently, the intersection of agents and just making
families and domestic life work better. Basically the way that the family claw setup is different
agents that have different responsibilities and coordinate all the context of the absolute boat
load of things that the average family needs to do on any given week. By the way, if you are
interested in agents in this more family or home life context, check out the eRESUNA16Z podcast
with Jesse Gennett, Jesse is a friend and serial entrepreneur who is doing some super interesting
things with open claw as she homeschools four kids under five. A really interesting matchup comes
in Region 2 between NODISELF which is basically an agentic medical training platform and Riteside AI
which is kind of an agentic social experiment. Riteside AI describes itself as a social cognition
agent for AI agents that tries to actually model relationships. They write that they deployed it on
multiple which is of course the social network for agents and gave it a simple task of making
friends. Within 48 hours they say it was engaged in over 200 mutual conversations with other bots.
Meanwhile NODISELF is an agentic medical training platform that's a multi-agent system that's
designed to give medical students the ability to learn in a more dynamic environment. It includes
four AI agents including a cognitive coach that activates the clinical knowledge before the crisis
as well as agents for running the simulation debriefing on what went wrong and one to author
the clinical blueprints that make it medically accurate. It's designed for a very specific audience
in a very specific domain using new capabilities to theoretically make the real world work better.
Finally in Region 4 we have Carrier File versus Retire Replan. Retire Replan is a privacy first
self-hosted Canadian retirement planning application that helps people model their financial life run
simulations, optimize different parts of their financial experience all on their own without
professional help, effectively empowering people to know much more about their own financial
destiny rather than just leaving it to an external expert. While Carrier File is in the spirit of the
context portfolio episode I did a couple of weeks ago it is a simple solution to a very common
problem a plain text file that carries your context across any AI. So those are some themes and some
of the specific projects from Agent Madness appreciate everyone who has contributed to the project
and I'm excited to see how these agents evolve over time. For now that's going to do it for this
operator's bonus episode, appreciate you listening or watching as always and until next time peace!