Podcast Ad Remover

-

Today, we are discussing how knowledge workers in general, but everyone else too should

0:03

be using Opus 4.7 and the new Codex app.

0:08

The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.

0:18

All right friends, quick announcements before we dive in.

0:20

First of all, thank you to today's sponsors KPMG,

0:23

Blitzi, Grenola, and Section. To get an ad-free version of the show, go to patreon.com-aideally-brief

0:28

or you can subscribe to Apple Podcasts. If you are interested in sponsoring the show,

0:33

send us a note at sponsors at aideallybrief.ai.

0:38

Now today is probably my favorite type of show. When we get a whole slew of new goodies

0:43

and get to dig in and see what they can do for us, how our capabilities have changed,

0:48

what new use cases become unlocked, and what the patterns are telling us about where the world

0:52

is going. Now yesterday we got not one but two big releases, one model and one harness.

0:59

The model, disappointing to some, was not Mythos Preview or anything related to it,

1:04

it was Opus 4.7. And you could feel around the communications that Anthropic knew that there

1:09

was going to be some amount of disappointment that this wasn't Mythos and so it was going to have

1:12

to be fairly impressive on its own right. Now on the other side, from OpenAI we got a new iteration

1:18

of their Codex application. It adds a whole bunch of new capabilities and is making some very

1:23

different bets as compared to how for example Anthropic is looking at its claw desktop app.

1:29

So what we're going to do today is discuss all the new things in both of these releases,

1:32

get some of the first reactions, and then specifically dive deep on what you as an engaged AI

1:38

user or knowledge worker or entrepreneur should try with these new releases. By the way,

1:44

if you want to follow along this episode, you can go to play.aiideallybrief.ai. It's where I keep

1:49

companion experiences and there is a whole website slash slide presentation that has all the

1:54

information that I'm going to share here, including some of the ideas for what you should do.

1:59

So let's talk first about what is new in Codex. Certainly one that people are talking about

2:03

quite a bit is that Codex now has computer use on Mac. Codex can see, click, and type across

2:09

any app on your computer with its own cursor. Multiple agents can work in parallel in the

2:14

background without interfering with what you're doing and Codex can now use apps that don't have

2:18

APIs. Now one of the big ideas that you're going to see is that Codex, which was not

2:24

normally designed as an app for coding, is very quickly becoming not just for coding. Yesterday,

2:29

I tweeted that the problem with the term vibe coding ended up not actually being that all coding

2:33

became vibe coding, but that all knowledge work is becoming coding work. And you can see that very

2:38

much on display in terms of where the codex app is going. Another new feature is the in app browser

2:44

with comment mode. Basically, you can now load a page inside Codex and click directly on elements

2:49

to give the agent precise context. This is really useful for things like front end iteration,

2:55

bug reporting, and basically any workflow where pointing at the thing is faster than just describing

2:59

it. Native image generation now lives in Codex with GPT image 1.5, meaning that you can generate

3:05

mockups, edit images, and create variants all inside the same thread as everything else.

3:10

This pairs really well with the new rich file previews and artifacts beyond codes.

3:15

PDFs, spreadsheets, slides, and documents now render inline in the sidebar. Codex produces these

3:21

as artifacts that can be downloaded and interacted with, not just as code. One thing that's

3:25

really clear from the new Codex is that they are definitely taking lessons from OpenClaw to heart.

3:30

Pass from OpenAI writes, biggest lesson from OpenClaw is that a good teammate doesn't start from scratch

3:35

every time you check in. They remember what was decided, what's still open, and proactively help you.

3:40

Today we launched heartbeats in Codex, automations that maintain context inside a single thread over time.

3:46

Instead of each run starting fresh, Codex wakes up in the same conversation, with the history and

3:50

context it needs already in place. You can also have it schedule its own next steps.

3:55

Think about the overhead that quietly accumulates every morning, scanning slack channels,

3:59

catching up on email, piecing together what moved overnight. With a heartbeat, you offload

4:03

that once and wake up to a brief already waiting in a pin thread. Now, Pass suggests

4:07

turning Codex into a chief of staff, which is something we'll come back to in a little bit.

4:11

So to summarize, you've got here automations that resume existing threads, which establishes

4:16

this whole new monothread pattern which we're going to talk about in just a minute. And Codex also

4:20

has project list threads. Flavio Adama writes, the most underrated feature in the new Codex is chats

4:25

without a project. Before this, I was literally using a project called Trashcan as a home for

4:30

every random thought or personal tasks. Basically, this means you can just dive in without having

4:34

to pick a repo first. This is what led Jason Lutha call it the new Notes app. There are also a

4:39

whole bunch of daily use quality of life improvements in Codex, including a Mac OS menu bar and a

4:44

Windows system tray with pin in recent threads, a global hotkey to bring up a mini Codex window from

4:49

anywhere on your Mac, tabbed terminals inside each thread so you can run builds, servers and test

4:53

in parallel, slash compact as a standalone command, and a theme picker for the command palette.

4:59

Now one note on the computer use thing that so many people are excited about,

5:02

that is Mac only right now, although they say Windows is coming. People's first impressions are good.

5:07

Riley Brown from the vibe code app writes, this is exactly what I was hoping for.

5:11

Full permissions, no co-work like feature, which limits agents abilities, just Codex.

5:16

If you ask for a coding task, it writes code and gives you a preview. If you ask for a presentation

5:20

or doc, it gives you a presentation or doc, organized by project on the left sidebar, easy to create

5:24

skills, easy to app mentioned skills and plugins. Now this pattern of not breaking things into

5:30

different UIs for different use cases is something we'll come back to as well and is a major

5:33

differentiation between the way that Codex is evolving in the way that Claw desktop app is currently

5:38

set up. Commenting on computer use, Erie Weinstein writes, this is the first time I've ever seen

5:42

an LLM operate a GUIAS fast as a person and it's surreal. Aaron Levy from Box gets that this is

5:48

very clearly not just Codex as a update for developers, but is thinking about how knowledge workers

5:53

in general will work in the future. He writes, the new Codex is another jump in what agents will look

5:59

like for knowledge workers. Agents that can code, work with tools and use computers can begin to

6:04

execute long running tasks in the background for all areas of work. This can mean drafting reports,

6:09

setting up data rooms for a merger, reviewing contracts, helping onboard clients, generating marketing

6:14

assets, processing invoices and more. So a couple things that I wanted to double click on.

6:19

Nick Bauman on the Codex team wrote an interesting post called my Codex threads are alive

6:24

and the big statement from Nick is that he has become monothread-pilled.

6:28

Nick writes, the most useful Codex thread I have right now is the one I've been using for the last

6:32

three weeks. Every hour it checks my slack, g-mails and PRs I wrote or I'm watching, it turns the

6:38

noise into clean signal I can act on. My Codex usage has shifted from starting lots of short-lived

6:43

chats to keeping a small number of threads alive around recurring work streams. I still start

6:48

fresh threads constantly but some work should not reset every time I ask a question.

6:54

So the old mental model of AI assistance is that you either A start fresh for every task or B

7:00

maybe create a project folder where context can live around a set of tasks but where you're still

7:05

frequently starting fresh just hopefully relying on the context that's stored in the project

7:09

to have the new thread be up to speed. Now this paradigm of every question being a new chat

7:14

in every project being a new conversation was to some extent forced on us by technical limitations.

7:20

It was a byproduct of the fact that long threads used to degrade, context got muddy, the agent lost

7:25

the plot and you were better off starting over. One of the key pillars of my work when I'm working

7:29

on complex projects with Cloud or ChatGPT is the handoff documents I have the AI create as I start

7:34

to see the signs of them running into the end of their context window. However, the Codex team has

7:40

now shipped compaction improvements that weaken that assumption. About a week ago engineer Anthony

7:45

Kroger wrote, I literally never worry about context windows using Codex. It can compact like three

7:50

times and the model still remembers the details somehow. Back even before this new release, Nick

7:55

Bauman again wrote, so much coding agent design is built on the assumption that breaching context

8:00

windows and compacting context deals progressively worse results. When you drop this assumption,

8:05

the product direction it opens up is very exciting. He continues in his new post, put simply,

8:11

with good context compaction, a thread's value increases over time. I've talked in the past about

8:17

how we need some sort of benchmark for new models or new product releases that isn't about

8:22

performance on standardized tests, but about the new use cases that get unlocked by any new release.

8:27

Nick is basically talking about exactly that. He writes, my own version of a monothread is a work

8:32

teammate thread. My work is noisy and spread across Slack, Gmail, GKal GitHub, files in an obsidian

8:37

vault and a bunch of other Codex threads. I need something that can filter the noise and tell me

8:41

which few things are worth caring about. I use one thread to check those places, remember the

8:46

current priorities, and tell me when something needs my attention before I would have found it myself.

8:50

I run this as one main teammate thread plus a few long lived sub agent threads. The main thread

8:56

handles orchestration and judgment. The sub agent threads keep depth in their specialties.

9:01

The main thread can also spawn new sub agents for new work streams as they appear.

9:05

The main thread wakes up, checks the current priority, reads the smallest useful live signal, uses a

9:09

specialist sub agent thread only if that lane matters and then decides whether to notify me or stay

9:13

quiet. Now what's super interesting to me about this is that this is basically an alternative

9:19

architecture for the project manager open clause and chief of staff open clause that I built as part

9:23

of my first experiments with that system. This is of course a radically simpler implementation of that.

9:29

And speaking of open clause, part of how Nick gets value out of these mono threads is thread

9:33

automations. He writes, a thread automation is an interval trigger on an existing Codex thread.

9:39

It is not just a scheduled prompt because the automation runs in the same thread with the context

9:43

and corrections already there. That makes the natural prompt very simple. Keep an eye on this for me.

9:48

If a thread checks Slack, Gmail, GitHub, Docs, and Calendar on a schedule, it accumulates examples

9:53

of what you care about. It sees which asks you act on, which drafts you edit, which updates you

9:57

ignore and which sources usually matter. Over time, the useful behavior is not a bigger summary.

10:02

It is a short interruption when something actually matters. Now Jason Liu from OpenAI takes

10:07

this a step farther, actually creating a recipe for a personal chief of staff. The Codex Chief of Staff

10:13

takes advantage of a local folder vault, which is the durable memory layer and the working folder

10:17

that Codex opens up and interacts with. The vault has a small agents.md file that tells Codex how the

10:22

vault works. The principles that Jason shares are a project's folder that gets one note per active

10:28

project or workstream and a notes folder that gets scratch notes drafts and one-off captures.

10:32

The agents.md file creates a number of instructions around how to work, like preferring to update

10:37

existing notes over creating new ones and keeping facts separate from guesses and more. From there,

10:41

the Chief of Staff interviews you to get a sense of who you are, what are you responsible for,

10:46

who matters, what are you worried about missing, which Slack channels email threads Docs,

10:50

repose and meetings matter, what you not want to be interrupted about. Now if you've tried the

10:55

personal context portfolio I released a couple of weeks ago, you could of course just transport

10:59

that over there and not even have to do the interview step, although there is value of course

11:03

in having a follow-up interview even after you've given Codex all of your personal context.

11:07

From there, Codex proposes the three to seven project notes to create, the smallest useful

11:12

agents.md improvements and which plugins or connectors to install. Those common plugins might be

11:18

things like Slack, Gmail, Drive, Calendar, GitHub or more. Now there's more in here, but the one last

11:23

piece that I wanted to point out, harkening back to the clawfocation of everything, is the idea of

11:27

the core loop being on every 15 minute Chief of Staff heartbeat. Every 15 minutes or at whatever

11:32

interval you want, the thread wakes up, and like Nick Bowman's monothread checks whatever sources

11:37

you gave it access to like Slack or Gmail, looks for pending asks, blockers or decisions.

11:42

It notices how your priorities seem to be changing, and it keeps interviewing you over time.

11:47

As it does so, it uses your answers to improve the heartbeat prompt, agents.md and project notes.

11:52

So I think if you're going to try just one thing with Codex, it would be this monothreads

11:56

slash Chief of Staff idea, but I've also put on this companion site a ton of other use cases

12:01

that I think are worth trying and that are enabled by this new set of features.

12:05

So one category of these is around recurring reporting and monitoring. Basically anything where

12:10

you have some sort of frequently repeated reporting need, where you have to look at a bunch of

12:14

sources, aggregate them, pull out the most important signal and do something with it,

12:18

is really well suited to the new features of the Codex app. That could be a morning brief that

12:22

pulls Slack DMs on red emails, notion updates and calendar. It could be a weekly customer health

12:27

check that looks at channels like Intercom, and you could probably think about a half-dozen more of

12:30

these recurring monitoring type situations that you interact with. Some other ideas to take

12:35

advantage of the new computer use for those of you on Mac are things like legacy system data entry.

12:40

If you have some old vendor portal or ancient ERP or accounting software from a decade ago,

12:45

the computer use features could drive those systems now and make your life significantly easier.

12:50

You could also try moving data between systems that don't integrate. One example that some

12:54

people have given is moving from granola to obsidian vaults. There are about a dozen different

12:58

ideas there of other Codex use cases worth trying now, but let's move on to Opus 4.7.

13:03

All right folks, quick pause. Here's the uncomfortable truth. If your enterprise AI strategy is

13:13

we bought some tools, you don't actually have a strategy. KPMG took the harder route and became

13:18

their own client zero. They embedded AI and agents across the enterprise, how work it's done,

13:24

how teams collaborate, how decisions move, not as a tech initiative but as a total operating

13:28

model shift. And here's the real unlock. That shift raised the ceiling on what people could do,

13:33

human state firmly at the center while AI reduced friction, service insight and accelerated

13:37

momentum. The outcome was a more capable, more empowered workforce. If you want to understand

13:42

what that actually looks like in the real world, go to www.kPMG.us slash AI. That's www.kPMG.us slash AI.

13:53

Blitzie is driving over 5X engineering velocity for large scale enterprises. A publicly traded

13:59

insurance provider leveraged Blitzie to build a bespoke payments processing application,

14:03

an estimated 13 month project, and with Blitzie the application was completed in live in production

14:08

in six weeks. A publicly traded vertical SaaS provider used Blitzie to extract services from a

14:12

500,000 line monolith without disrupting production 21 times faster than their pre-Blitzie estimates.

14:18

These aren't experiments. This is how the world's most innovative enterprises are shipping

14:22

software in 2026. You can hear directly about Blitzie from other Fortune 500 CTOs on the

14:27

modern CTO or CIO classified podcasts to learn more about how Blitzie can impact your SDLC,

14:33

book a meeting with an AI solutions consultant at Blitzie.com. That's B-L-I-T-Z-Y.com.

14:40

Today's episode is brought to you by Grinola. Grinola is the AI notepad for people in back-to-back meetings.

14:45

You've probably heard people raving about Grinola. It's just one of those products that people

14:49

love to talk about. I myself have been using Grinola for well over a year now and honestly,

14:53

it's one of the tools that changed the way I work. Grinola takes meeting notes for you without any

14:57

intrusive bots joining your calls. During or after the call, you can chat with your notes,

15:02

ask Grinola to pull out action items, help you negotiate, write a follow-up email,

15:06

or even coach you using recipes which are pre-made prompts. Once you try it on a first meeting,

15:10

it's hard to go without. Head to grinola.ai-slash-ai-daily and use code AI-daily. New users get 100% off

15:18

for the first three months. Again, that's grinola.ai-slash-ai-daily.

15:23

Here's a harsh truth. Your company is probably spending thousands or millions of dollars on AI

15:28

tools that are being massively underutilized. Half of companies have AI tools, but only 12%

15:33

use them for business value. Most employees are still using AI to summarize meeting notes. If you're

15:38

the one responsible for AI adoption at your company, you need section. Section is a platform that helps

15:43

you manage AI transformation across your entire organization. It coaches employees on real use

15:47

cases, tracks who's using AI for business impact, and shows you exactly where AI is and isn't creating

15:53

value. The result? You go from rolling out tools to driving measurable AI value. Your employees

15:59

move from meeting summaries to solving actual business problems, and you can prove the ROI.

16:03

Stop guessing if your AI investment is working. Check out section at section ai.com. That's

16:09

s-e-c-t-i-o-n-a-i-dot-com.

16:17

The biggest knock on Opus 4.7 is not about what it is, but about what it is not.

16:22

For the last couple of weeks, we've been hearing about just how powerful Anthropics Mythos

16:26

preview model is, and this is not that. Still, it does seem to represent a pretty meaningful

16:32

capability jump, and if it weren't for knowing that Mythos preview was out there, my instinct is

16:36

that people would be pretty stoked about this. And of course, some people are. As they often do,

16:42

I think, latent space nailed it, calling it literally one step better than 4.6 in every dimension.

16:47

If you look at just the agentic coding chart, you get a sense of what 4.7 is about. 4.7 low

16:53

is strictly better than 4.6 medium. 4.7 medium is strictly better than 4.6 high. 4.7 high is now

17:00

better than 4.6 max. Now that's reflected in the overall coding benchmarks, but you see the same

17:05

pattern in other benchmarks that matter for knowledge workers as well. The finance agent jumps

17:09

from 60.1 to 64.4 percent, Office QA Pro from 57.1 to 80.6 percent, OS World Computer Use 72.7 to 78 percent.

17:20

Basically, you can see that these are, in many cases, not just incremental changes, they're

17:23

pretty meaningful. And people's first experience with this seems to validate the benchmarks.

17:28

They made about 20 percent more money on the vending bench to test, and many people's first tests

17:34

around visual and design tasks are really positive as well. Mike Taylor writes,

17:38

Opus 4.7 has the distinct honor of making the best PowerPoint I've ever seen in an LLM.

17:43

Adam.new writes, Opus 4.7 appears to be state-of-the-art at agentic CAD design.

17:48

This week in AI argues that the leap in design sensibility between 4.6 and 4.7 is really significant

17:53

as well. Now I did dig into this, because front-end design and website design is one of my most

17:59

frequent use cases, and I wanted to test not only its design capabilities, but its reasoning

18:04

around design. So I gave both 4.6 and 4.7 the task of redesigning the kitschy and fun, but

18:10

ultimately kind of challenging AI Daily Brief website that's currently in its terminal theme

18:14

into something different. 4.6, which is a good designer, did a good job, although if you've used

18:19

cloth out of the box for design, it is going to feel very clawed to you. The font choices at this

18:24

point are getting extremely predictable, as are the color palettes. I was able to push it to do

18:29

another direction, which was a little more in line with the terminal theme, and again, it did a

18:33

totally fine job. What I would say about my interaction with 4.7 on this is that one,

18:39

it certainly had more variety in terms of the visual approaches it was proposing,

18:44

and when I slowed it down, it could actually do some thoughtful reasoning on the ways to set up

18:48

the site. But it certainly wasn't a panacea. Based on my first experience, the band of what I'm

18:54

able to get out of 4.7 is a meaningful upgrade, but I almost have to slow it down and make sure that

18:59

it uses its full reasoning capabilities before it just rips out to design something that looks good,

19:04

but isn't all that well considered. Now, there are a few areas where there seem to be some

19:09

regressions as well. On one long context retrieval benchmark, the score between 4.6 and 4.7 dropped

19:15

from 78.3% to 32.2%, although clawed code creator Boris Cherny said that that benchmark is being

19:22

phased out because they believe that it overweights distracts or stacking tricks and doesn't reflect

19:26

real-applied reasoning. Now, with the new model, the team and Anthropic suggests that there are

19:32

some tweaks to how you want to interact with it to get the most out of it, and that might break

19:36

patterns from how you've used models like 4.6 in the past. Cat Wu, who is one of the leaders of

19:40

the clawed code team and Anthropic and co-creators of it, gave a few tips. One, she suggested to

19:45

delegate not micromanage. Basically, she said, treat the model like a capable engineer that you're

19:50

handing a task to, not a pair programmer that you're guiding line by line. Progressive clarification

19:54

across multiple turns can actually reduce quality on 4.7. Relatedly, she suggests putting the full

20:00

goal constraints and acceptance criteria right up front. With every user turn adding reasoning overhead,

20:05

it makes more sense to give the model everything it needs up front. She also said that Opus 4.7 is

20:10

better at self-verification than any previous clawed model, but that you have to tell it how to verify

20:14

and build a verification loop in. Clawed code's Boris Cherny also shared a few tips. For example,

20:19

he talks about a new way to configure the effort level. Boris writes, personally, I use extra high

20:24

effort for most tasks and max effort for the hardest tasks. Max applies to gesture current sessions,

20:29

other effort levels are sticky and persist for your next session also.

20:33

So what are some things that you should try outside of just updates to your coding with clawed

20:37

code? One thing to check out is that there seem to be fairly big vision improvements, which means

20:42

that for things like taking whiteboard photos for meetings and translating them or trying to

20:46

interact with dense dashboard screenshots, this model should be much better. It should also be

20:50

able to better pull chart images from PDFs, 10Ks research reports and things like that, and it should

20:56

be able to better reason over screenshots as well. Think about, for example, looking at the onboarding

21:02

flow from a competitor and comparing it to your companies and asking what the competitor is doing

21:06

better. Maybe even a bigger thing to try is longer harder tasks. Everyone from the Anthropic

21:12

team really emphasized that this model is all about less babysitting in more real delegation.

21:18

What does this open up? Well, you should try things like end-to-end research projects. Instead of

21:23

summarizing this article, get it to research the state of a topic using a bunch of URLs,

21:27

the internal notes, and outputting a significant product on the other side. You can also do extended

21:32

reasoning tasks like legal argument construction, investment thesis development, or strategic

21:36

option analysis that previously you might have had to break into pieces because the model would

21:40

lose the thread, but which now can be done in one pass. Full deliverable production, complex data

21:46

cleaning, cross-functional synthesis, multi-step analysis with verification, basically any harder

21:52

reasoning tasks that you might previously have tried to break into smaller pieces, you should at

21:56

least go try to see how four seven handles them natively right now without chunking them into

22:01

those smaller parts. Now, one more thing that I wanted to point out is a slight difference at least

22:07

right now in the UI design philosophy between the codex app and the cloud desktop app.

22:13

And remember, we got an update for the cloud desktop app just this week, so this is about as good

22:17

a comparison as you can ask for right now. In cloud desktop, you toggle between different experiences

22:23

for cloud chat, cloud coerc, and cloud code. On codex, it's just all one thing. Again, I read this

22:30

before, but what Riley Brown said, this is exactly what I was hoping for. Full permissions, no

22:35

coerc like feature, which limits agent abilities, just codex. If you ask for a coding task,

22:39

it writes code and gives you a preview. If you ask for a presentation or dock, it gives you a

22:43

presentation or dock, organized by project on the left sidebar. So the bet on the open AI codex

22:49

side is that the agent is smart enough that the interface should basically disappear. The implied

22:54

thesis is that switching modes is friction. And frankly, it harkens back to the original chat

22:59

GPT interface, which is kind of like one text box infinite capabilities. On the other hand,

23:04

cloud at least for now is betting that these three different modes of working are different

23:09

enough that collapsing them into one interface creates compromise. It's closer to the way that

23:13

native apps are designed now. IE, you don't write documents in your email client. The good news for

23:19

you as users is that if you have a strong preference towards one or the other, at least for the

23:23

moment, you have a choice for whichever is better for you. Overall, given that this was not the

23:28

release of mythos or open AI Spud, these things taken together still represent a pretty significant

23:34

set of upgrades and new features that are going to take us some time to really integrate into how

23:38

we work. For those of you who want to spend the weekend building and trying things, again, if you go

23:42

over to play.ai DailyBreathe.ai, the last slide is going to be 11 things that you can try right now

23:48

using these tools to see how much you can get out of them. I know for me the one that I'm going to

23:52

experiment with is the monothread approach and the codex chief of staff, which should be especially

23:57

interesting to see how it compared to the version of that that I originally created in OpenClaw.

24:02

For now that that is going to be our AI DailyBreathe for the day, I appreciate you listening or

24:05

watching, as always, have tons of fun this weekend, until next time, peace!

How to Use Opus 4.7 and the New Codex