Podcast Ad Remover

-

In this operator's bonus episode, we are talking about the agents that people are building,

0:04

the challenges they're running into, and what it teaches us about the full breadth of

0:07

agent use cases. The AI Daily Brief is a daily podcast and video about the most important

0:12

news and discussions in AI.

0:23

Alright friends, happy weekend. We have a quick little operator's bonus episode for you today.

0:29

As you know, for the last few weeks, I've been running this agent madness experiment.

0:33

I love a good bracket, March madness is fun, and I thought it'd be a cool way to show off

0:37

the interesting agents people are building. The big theme of 2026 is of course that agents

0:41

are officially real, and you, yes you, my friends can build them yourselves, and agent madness

0:46

is way less about the competition aspect and more just about a fun way outside of just a

0:51

gallery to show off what people are cooking up. We are now as of the time of this recording

0:55

in the Elite 8, but I wanted to zoom out even more broadly than that to talk about some of the

1:00

patterns that we saw. We had about 100 submissions and it was overwhelmingly solo builders, they

1:05

represented about 71% of the field. That said, among the projects that were accepted,

1:10

teams had an 87% acceptance rate versus 51% for solos. Now to give you a sense of how acceptance

1:16

actually worked, I wanted absolutely nothing to do with judging people's projects, so I had Opus

1:22

4.6 and GPT 5.4 to bait, give each project a score on a number of different dimensions,

1:27

and then effectively use those top 64 ranks to build out the bracket. I didn't actually have to

1:32

step in at all, so this is all an AI judge thing, so if your project didn't get in, your beef

1:36

is with the model labs. Unsurprisingly, the products that were live got in at a much higher rate

1:42

about twice as frequently as the companies that were still at the prototype stage, and one interesting

1:46

little note about 20% of the projects came from companies that said that they were entirely AI run.

1:52

In terms of observations, one really interesting thing is that people are not building themselves

1:57

tools. They are building themselves digital employees and org charts. Some are explicitly employees,

2:04

for example, Harold called itself an AI chief of staff, DiamondDousen.ai had Atlas as CEO, Nova

2:09

running engineering and Blaze running marketing, and know those aren't just people with really cool

2:13

parents, those are the names of the agents, the fleet runs seven agents with the chief of staff

2:17

orchestrator, and Myz has employee IDs for its agents, and even a three strike termination policy,

2:23

where one of the agents was fired for fabricating business logic. So in a very short amount of time,

2:28

you've gone from AI assistant to AI employee to AI org chart, and it's very clear that a big

2:32

strand of experimentation right now is not can AI do work, but what's the minimum level of human

2:37

involvement? Now for what it's worth, I don't think this is where things are going to land, I think

2:41

that it's very natural that we're in a phase where we're going to the absolute extremes to see

2:46

what's possible. This is of course the story of Polsia that we've covered on here before as well.

2:50

I don't really think the idea is that the optimal number of humans to be involved in a company

2:54

is zero or one. I think it's that by removing humans, you can see where the current coordination

2:59

and capability set starts to break down. Now if the org chart stuff was a really persistent theme

3:05

across the projects, many of the most emotionally resonant submissions pointed somewhere different.

3:10

These are products that I think you could see as markets of one. In other words,

3:14

there are problems that you wouldn't necessarily expect companies to build for because they're

3:19

so specific and discrete to the person who built them. And of course, this is where you see the

3:23

payoff of the changing cost of production of software. So a couple of examples from this pool,

3:28

someone with episodic graves disease gave Claude nine years of Apple Health data, and their

3:33

detector now catches thyroid flares two or three weeks early. A non-technical ADHD mom built life

3:38

coach OS, an Arkansas kayaker built creek intelligence, which predicts when rain fed whitewater

3:43

creeks are runnable, and a parent built a toddler behavior chart rendering as an exploding universe

3:48

called Jude stars. In terms of challenges people ran into, there is one clear infrastructure gap

3:54

that the whole field is screaming about and that is memory. A meaningful number of the submissions

3:59

are effectively elaborate workarounds for agents forgetting everything between sessions.

4:04

Myes uses 50 plus markdown brain files, sign up reported that their agents kept forgetting what

4:08

each other were working on, carrier file is literally a text file you paste into any AI to help with

4:13

context, open brain shares one MCP memory server across Claude code cursor and windsurf. All of these

4:20

hacks markdown files, knowledge graphs, vector DBs, copy paste text is kind of the diagnosis of the

4:25

big problem facing the agent ecosystem, which is the memory problem. Now in terms of who is building,

4:31

the median builder here is probably not who you'd guess. Partially that is of course because of

4:36

the wide nature of this audience. Partially it's because agent madness might have represented a

4:41

different type of opportunity that non-technical builders might not usually have had. Still,

4:46

we have paramedics, glaciologists, kayakers, restaurant operators, sales leaders,

4:50

people who are domain experts and can now use software to do things that they've always wanted to

4:55

do or solve problems that were never possible to solve before. The story of agentic coding,

5:01

as much as it is about changes in how software gets built, is actually more in my estimation

5:06

about changes in what software gets built for and who builds it. Now one really interesting

5:12

pattern that showed up is the idea of argument as architecture. Basically multi-agent debate is

5:19

showing up as an actual architectural pattern. In some cases, builders figured out that a single

5:24

LLM call was either unreliable or incomplete, rather than adding more retrieval they made agents

5:30

argue. One example of this is wiki-tax.ai which runs autonomous tax debates three times a day.

5:36

Part of what I think is interesting about this is that this is also how the bracket itself was

5:40

constructed. I had these two models debate to give scores and if you look on a particular matchup,

5:45

you can see a write-up of the models debate and who they think should win between the two.

5:50

By the way, if you want to make up your opinion completely outside of AI, what the AI thinks is

5:54

hidden by default but you can unlock it anytime you want. I think that this idea of argument as

5:59

architecture is a really interesting one though and a pattern that I'm certainly finding myself

6:02

attracted to. One other really interesting pattern that I think maybe hurls where we're going

6:08

is that there was a lot of physical world crossover. So for example, brainjam used EEG and FNIRS

6:14

brain signals to make an AI musical co-performer that adapts to cortical blood flow. HWAgent writes

6:20

and uploads firmware to our Dweenos from plain language and creak intelligence runs on Raspberry

6:24

Pi's parsing NOAA radar data in the field. TLDR people are definitely not just building

6:30

digital realm software they are thinking about the full integration of the physical world as well.

6:34

Now the defining challenge across all of this is that while the current state of tools

6:39

has unlocked things that were never possible before, especially for this set of builders,

6:43

there still is a huge gap between their average level of ambition and the infrastructure holding

6:49

it together. If we did this again next year, I think the types of things that people would be able

6:53

to build and which problems they would focus on would likely look significantly different

6:58

based just on how many of them are workarounds for the current problems of the agentic build space.

7:03

Now like I said, we are in the Elite 8, so I wanted to do a quick preview of these projects.

7:08

In Region 1, we have WikiTax AI versus Jekard. WikiTax, you heard me just talk about a minute

7:13

ago, but it describes itself as a fully autonomous multi-agent platform where AI tax specialist debate

7:18

with no humans in the loop. While Jekard is a multi-agent workspace operating system where

7:24

clawed Gemini and open code run autonomous scrum integrations, finding bugs, writing tests,

7:28

fixing code, and deploying to production with zero human intervention. So in both cases,

7:33

we have a real experiment around no humans in the loop and no human involvement,

7:37

but obviously very different outputs. One is applying AI to software engineering,

7:41

the other is applying it to a specific domain. Over in Region 2, we have WikiTax versus the family

7:46

claw. WikiTax says Web Search gives AI the internet, WikiTax gives AI the market. WikiTax helps

7:51

AI create a market intelligence layer between your data and your enterprise AI stack,

7:55

conditioning your surveys, engagements in market research and destruction intelligence,

7:58

your models can query, reason over, and act on. Effectively, it's a type of market data tool.

8:04

The family claw describes itself as a family of AI agents that talk to each other,

8:07

make phone calls, handle shopping and payments, and keep a household running. Now this is a theme

8:11

a lot of people have been talking about recently, the intersection of agents and just making

8:15

families and domestic life work better. Basically the way that the family claw setup is different

8:20

agents that have different responsibilities and coordinate all the context of the absolute boat

8:25

load of things that the average family needs to do on any given week. By the way, if you are

8:29

interested in agents in this more family or home life context, check out the eRESUNA16Z podcast

8:35

with Jesse Gennett, Jesse is a friend and serial entrepreneur who is doing some super interesting

8:40

things with open claw as she homeschools four kids under five. A really interesting matchup comes

8:47

in Region 2 between NODISELF which is basically an agentic medical training platform and Riteside AI

8:54

which is kind of an agentic social experiment. Riteside AI describes itself as a social cognition

8:59

agent for AI agents that tries to actually model relationships. They write that they deployed it on

9:05

multiple which is of course the social network for agents and gave it a simple task of making

9:09

friends. Within 48 hours they say it was engaged in over 200 mutual conversations with other bots.

9:15

Meanwhile NODISELF is an agentic medical training platform that's a multi-agent system that's

9:20

designed to give medical students the ability to learn in a more dynamic environment. It includes

9:26

four AI agents including a cognitive coach that activates the clinical knowledge before the crisis

9:31

as well as agents for running the simulation debriefing on what went wrong and one to author

9:35

the clinical blueprints that make it medically accurate. It's designed for a very specific audience

9:40

in a very specific domain using new capabilities to theoretically make the real world work better.

9:45

Finally in Region 4 we have Carrier File versus Retire Replan. Retire Replan is a privacy first

9:51

self-hosted Canadian retirement planning application that helps people model their financial life run

9:56

simulations, optimize different parts of their financial experience all on their own without

10:00

professional help, effectively empowering people to know much more about their own financial

10:04

destiny rather than just leaving it to an external expert. While Carrier File is in the spirit of the

10:09

context portfolio episode I did a couple of weeks ago it is a simple solution to a very common

10:13

problem a plain text file that carries your context across any AI. So those are some themes and some

10:19

of the specific projects from Agent Madness appreciate everyone who has contributed to the project

10:23

and I'm excited to see how these agents evolve over time. For now that's going to do it for this

10:28

operator's bonus episode, appreciate you listening or watching as always and until next time peace!

Agent Building Trends [Operator Bonus Episode]