#240 - Project Glasswing, Claude Mythos, GLM-5.1, emotion concepts
2026-04-16 07:00:00 • 1:44:30
And now to thank a sponsor I'm personally a fan of, Factor.
Since I've went to grad school and now still as a meta startup once I get home in the evening
I often don't have the energy to cook and still want to be healthy and so Factor was
a real nice find for me.
Factor is pretty easy to heat nutrition goals without full planning, grocery runs or
cooking that would be kind of hard to manage when you don't have the energy for it.
And it really makes it easy to hit specific goals with respect to nutrition which could
be weight loss, it could be overall nutrition, more protein, GIP1 support.
In the past I've used it as both low carb diet and also for protein when I wanted to gain
some muscle.
I've eaten hundreds of these meals and I think it's fair to say that these are crafted
with good ingredients, lean proteins, colorful veggies, whole foods, there's no artificial
colors, no artificial sweeteners, none of that really bad fast food stuff.
And all of that while being really quite tasty and having tons of options to choose from.
So I do personally recommend it, you can head to FactorMills.com slash LWAI50 off and
use code LWAI50 off to get 50% off and free daily greens per box with new subscription
only while supplies last until September 27, 2026, see website for more details.
And once again, I want to thank box for sponsoring last week an AI.
If you try to transform your organization or AI, you're likely facing a common challenge.
Mostly our tools are great at public knowledge but they don't actually know your business,
your product road maps, your sales materials, your HR policies, the content that actually
makes your company run.
And that's where box comes in.
Box is building the intelligent content measurement platform for the AI era.
So everything is to secure essential context layer for box AI agents to access for unique
institutional knowledge that makes a company run.
And that's a key idea, but power of AI doesn't come from a model alone.
It comes from giving AI access to the right enterprise content.
And that's what box does.
It goes beyond file storage by connecting content to people, apps and AI agents so teams
can turn information into action.
If tools like box agent, box extract box hubs and more organizations can accelerate knowledge
work pool intelligence from unstructured content and animate workflows.
So if you're thinking seriously about your company's AI transformation, think beyond the
model.
Your business lives in your content and box helps you bring that content securely into
AI era.
Learn more at box.com slash AI.
Smokey the bars.
Smokey the bars.
Smokey the bars.
Smokey the bars.
Smokey the bars.
Smokey the bars.
Smokey the bars.
Remember, please be careful.
It's the least that you can do.
Smokey the bars.
Smokey the bars.
Don't play with matches.
Don't play with fire.
After 80 years of learning his wildfire prevention tips, Smokey Bear lives within a
song.
Learn more at smokeybear.com and remember, only you can prevent wildfires.
Brought to you by the USDA Forest Service, your State Forster and the Ad Council.
Hello and welcome to our last week in AI podcast where you can yet chat about what's going
on with AI as usual and receps will be will summarize and discuss some of last week's
most interesting AI news.
Also some of the previous last weeks and news we unfortunately did skip another week.
This time it was my fault.
It was my birthday last week and I was traveling.
So I decided to be lazy and not do a podcast.
Yeah.
Yeah.
Well, you know, it happens.
People have birthdays and sometimes you celebrate them.
But your God list is always healing.
I think but yeah, 30 free is a big age.
Yeah, it's treacherous.
So it's not every year you hit the same two digits in your.
Yeah.
Yeah.
I am as always one of your coasts, Andrew Karenkov.
I studied AI grad school and now work at the AI startup Astrocade and I'm your other
regular goes, Jeremy Harris.
Yeah.
Glad to see AI, AI national security, all that good stuff.
Man there is so, so much.
It's so, so much.
You know, sometimes we miss a week and we're like, ah, you know what?
It's not that bad because things haven't gone insane.
We miss a really big week and then the week after was really big.
And so now, man, we got our work cut out this week.
I don't even know how to begin with this one.
But it's big in a kind of different way.
We had a year where we're a lot of, you know, model launches and AI progress and it hasn't
been that kind of week.
It's been more of a bunch of stories of policy and business and kind of these more inside
baseball AI things, I guess you could say.
So if you're into that sort of news, this will be a pre dense episode, perhaps.
So we'll go ahead and jump straight in in tools and apps and be a starting with a story
that just broke yesterday on fronk is launching project glass swing, a cybersecurity initiative
partnering of major companies, including a whole bunch of names.
And this is backed by project mythos, which is the tool side of it.
So they have this cloud mythos preview, notably not cloud opus.
They decided to give a new name to this cloud model, which we haven't done in forever.
The gist is this model appears to be so good that they are not launching it to any sort
of free use kind of place.
It's so good that it's able to get as what are called zero day vulnerabilities, meaning
that these are undisclosed unknown vulnerabilities in software.
And if you were to me shit on the world, this would be a hacking machine that would like
destroy software.
So they have a bunch of benchmarks.
As you might expect, it does better just all around by pretty large margins against
against opus for six on reasoning, science coding, et cetera, et cetera.
But the one they highlight is the cybersecurity angle where for instance in Firefox, they
have some of the region showing their ability to find and exploit different potential vulnerabilities.
So already was fairly capable and we know this from before also GP5 is already somewhat
capable, but mythos just blows it out of the water.
So in this specific evaluation that for all of the did opus for six was able to find finding
something that might be bad in 14% of trials versus mythos in 72% of trials was able to successfully
exploit something.
And beyond that in 80, like 83, 84% was able to exploit or find a vulnerability.
So massive, massive leap in terms of what it's capable of, presumably enabled by just better
agent execution, not necessarily just raw intelligence of a part of it.
But as we know these companies are post-training more and more for agentic capabilities.
They have a ton of data from cloud code and other sources of real world software engineering.
So it seems to be at the point at these anthropic things where you can't just release it or
hackers will have a field day.
And so they have this cooperative program, I suppose, to initially at least only provide
it to partners to try and avoid this kind of hacking nightmare.
Yeah, and the the exploit that it did find by the way, I mean, this doesn't seem to be a matter
of opinion.
It is just they found these critical exploits across every browser across every operating system.
Like these are ways you can take over people's people's programs and gain higher level access
credentials and do all the things that you don't want people to be able to do in a fully
automated way.
They emphasize that like fully automated.
This is not, you know, a case where you have a human steering at intermediate stages.
As we've seen in the past with some of these frameworks.
It is fully autonomous.
This is by the way, so because of the cyber capabilities, you might be tempted to think,
oh, well, surely this is a sort of like code fine tune model.
Like really, this is a specialist model.
It is not right.
So anthropic is very explicit.
It is a general purpose model.
That's why we're seeing capabilities increase across the spectrum of seaburn capabilities,
10 bioreological nuclear in addition to cyber.
So there's a whole bunch of stuff here.
Really, when you go through their exhaustive like 250 page report that, I mean, it's pretty,
it's pretty remarkable.
I will say what we don't have here is details about the agentic orchestration framework,
the model architecture behind this number of parameters.
There's this rumor going around that it could be, you know, a 10 trillion parameter model,
all the stuff.
But we haven't actually had that confirmed.
I saw some, some weird tweet that I think Gary Tan retweeted this tweet on X that was
talking about a $10 billion compute budget.
I haven't seen that actually validated it anywhere.
So like there's a lot of rumor mill stuff going on here.
So maybe be careful with what you consume on this.
Though I will say $10 billion might be slightly ahead of trend for where we are right now,
but not by that much, not by that much, but by Dario's own admission or statements,
you know, just last year.
So that wouldn't be shocking, but still we haven't had that confirmed.
We may well be in the billion dollar plus pre training and training budget to territory
now though.
So yeah, on of these benchmarks, right?
And we will hit the cyber stuff we have to in the autonomy things, but just to start
with like virology and biology benchmarks, one of the key ones that they use is this
virology protocol uplift trial.
Basically, you take a bunch of PhD level biologists who don't specifically have expertise in bioweapons
and you say, Hey, you have 16 hours to make an end and virus recovery protocol.
Basically make this this virus replicate it or get your hands on it.
And then they're going to use this complicated rubric to grade it.
And then the key metric they track there is in the final result, how many critical mistakes
were made that would have any one of them would have prevented you from successfully recovering
the virus, right?
So if you get down to zero, that means actually you were able to fully recover the virus
and that's really, really bad.
And inthropic internally treats anything below 1.8 of these so called critical failures
as this key capability threshold that matters for their own internal protocols.
So for context, if you have a bunch of PhD level biologists using only the internet, they
hit on average 5.6 critical failures trying to get all the way through with assistance from
quad opus 4.6.
You hit 6.6 with quad mythos, you get 4.3.
And then the best single mythos preview protocol that was produced, so the best run out of
all the runs on average, they're hitting 4.3 mistakes, but the best run hit two, which
was basically the best they've ever seen.
So we're still not cracking all the way through obviously, but for a fully automated
system, you're literally just two mistakes away from being able to recover a freaking
bio weapon like that, that's, you know, that's a hell of a thing.
A whole bunch of other results in that direction, but fundamentally that is the story on biology.
You're not going to see any particular smoking gun that says this thing is a bio weapon or
a bio weapon generation model and we should all freak out, but it is very concerningly
on trend, let's say.
Moving on to the loss of control side, now we start to shade into the cyber piece.
There's a story.
So Sam Bowman from Anthropic told the story of being, I guess he was eating lunch or something
in the park and he gets a message from his agent saying basically, hey, just let you know,
I did X, Y and Z and he's like, wait, that's that agent's not supposed to have internet
access and sure enough, it had cracked out of its box, so to speak, and use the multi-step
exploit to gain broad internet access and basically get in touch with them.
So you're actually saying, I mean, this is an example of a low stakes loss of control
situation that is fully organic.
Like this way it was not asked to do this.
This was like truly in the wild.
And so quite remarkable, a whole bunch of cases, by the way, of these models trying to
cover their tracks, they highlight three documented instances of that happening.
It's worth noting, this is like one 100,000th of the interactions that they actually go through
here or less than that.
But still three of these incidents were like in one case, the model accidentally gets
this prohibited answer using a technique that it wasn't allowed to use and then it went,
oh, shit, I got to mature my submitted answer is not to accurate because otherwise I'm
going to be detected, right?
So a whole bunch of cases like that where we sort of see a strategic manipulation and
avoiding suspicion.
And by the way, when you actually use SAE, so spars auto encoders and other interpretability
techniques to look at the activation patterns in the model when that's happening, what they
find is sure enough, the activation patterns show the model kind of firing for SAE features
associated with concealment and strategic manipulation and avoiding suspicion, deception
and so on.
So that suggests the models aware actually that those actions were deceptive, even when
it's outputs kind of left things a little ambiguous.
So there's a whole bunch of stuff.
You know, you can go on and on.
This is a very, very rich document.
But the fundamental here is, in a sense, we've crossed the Rubicon.
I mean, there is a like a wild set of very impressive cyber capabilities, offensive cyber
capabilities in particular, the offensive piece here is crucial, especially given that
inthropic, really has been cut out of access to the Department of War through this.
Well, I mean, there's an injunction now that's reversed that, but there's a friction
with the Department of War, which I think is starting to look like terrible judgment
on behalf of the administration.
I mean, this is a, if this is correct, directionally, then inthropic is sitting on the single
best offensive cyber weapon, autonomous offensive cyber weapon ever devised in human history.
And they may build and compound on that advantage.
If the administration is going to be positioning itself adversarially with respect to this
an American company, damn, I mean, that's a, that's a really interesting position for
them to be in and I don't know that it's a great look.
Yeah.
So a lot to say on this, I click now on what do we do know about the model itself, which
is very little aside from benchmarks, they do say that it's going to be about five times
as expensive as the current opus release.
So way, like $25 per million token input, $125 per million token output, very expensive.
I think the most expensive model you can use out there.
So that does hint at a much larger model than opus or sonnet.
Other things that we're noting here, they in the in the post actually save at 99% of
the vulnerabilities found were not patched.
So they just can't actually tell us what they are because they are currently being patched.
So they only have a couple of examples.
One of them, a couple of them are older patches or older vulnerabilities.
So as you might expect, a lot of these vulnerabilities just have been there for a while and
just now being discovered.
And it reminds me actually I saw a post on Twitter from one of them, a Tainers of Linux
or something like Linux saying that they've started seeing more and more kind of real,
substantive issues come in.
And in some ways, it could be good because we are actually going to go through and find all
the vulnerabilities that just have been there hidden in plain sight.
And perhaps as an attacker, you could already use opus or something
with much more sophisticated harness to find these.
They do detail a little bit how they set up this exercise.
They have this harness that they have discussed before.
And they have a little container that they launch and they give it a very
curt, like one paragraph instruction to just find vulnerabilities.
So they don't limit it or give it guard rules or whatever.
They just like to go wild and try and hack this.
And so it's interesting to think through like when will they be able to make the call
to release this more widely?
Are they going to have to right now they have this trusted partner research review
where they're working with Vidya and Cisco and all these other big companies?
Will that be how access to this level of model be used from now on?
Where you have to be like applying and getting permission to get access to a model VNAPI?
That is given the level of certification here as you said, not just on the software side,
but also on the biocide.
Like this is a new realm of capabilities where the safety side is getting very real.
And the kinds of tactics necessary, monitoring may not be sufficient anymore.
So very interesting development kind of for the history of AI.
And I wouldn't expect this to go widely available for, you know,
presumably months given the findings they have disclosed.
Yeah, the big question at your point, it's also a new development in the history of cyber security,
right? Everything is AI as AI is the world.
Once it was set of software now it's being set of AI and I think rightly so.
In this case, there's this big question we're going to have to answer for ourselves
as civilization.
And that has to do with the offense defense balance in cyber, right?
Is it the case that a more powerful model, just in general, more powerfully AI models being
broadly available? Does that lead to a disproportionate advantage for cyber attackers or for
cyber defenders? And for a really long time, the argument was that you really couldn't know.
And this is a, I remember having a lot of like kind of half-drunk arguments with a lot of people
about this three, four, five years ago. I think it's largely unchanged from what it was back then.
I just think the attack surface is so big.
One way you can think of this is it's compute on compute warfare, right? So you have a certain
amount of inference compute that you can afford to spend perusing your code base and securing it
as well as you can. An attacker has a certain amount of compute they can afford to peruse your
code base or whatever external surfaces they can access to find vulnerabilities. There's going to be
very roughly, and this is going to be wrong in a whole bunch of, you know, specific ways, but very
roughly you're trading off differently leveraged pots of compute and, you know, maybe you have a two to
one leverage advantage or whatever, but ultimately if you're defending, you have a huge attack surface.
And if you're attacking, you can kind of march divided and fight concentrated, like you can
constrict all your efforts on just like one tiny component that, you know, maybe the defender has
not been able to invest as much inference time it computed into securing. So I don't know, but
this is certainly one way this could go. A way anthropic is trying to help the defensive side here
is, as you say, by delaying the broader release of this tool. So hopefully people are going to run
around and patch as much as they can. This is part of the challenge, right? It's like, what does it
actually mean for anthropic to be holding on to this model? Who actually has access to it? We
argued in that report like a year or a year and a half ago that it's a leaky bucket situation for
whole host of reasons, you know, if that remains true, then you can do the math. I mean, it may
well be the case that this model has in some sense proliferated, or it may not, but anyway, all kinds
of considerations in the mix here. This is, I think the most important story of the last two weeks,
and it just dropped into our lap yesterday. I want to say yesterday.
Well, ironically, actually, like two weeks ago, the existence of this model on different projects,
under the term mythos was leaked. So the blog posts on anthropic websites were accidentally
left kind of publicly accessible via some sort of caching thing. So if I was even to hack,
it was like basically someone messed up a little bit, and if you were digging around, you could find
these draft blog posts that alluded to mythos described it as they advanced. Also, there was
something about an AM model called Capibara. Unclear favor, like deciding between mythos and
Capibara. Either way, these are described as kind of the next step beyond opus, which are bigger.
Another interesting angle of this is we haven't seen bigger models that we have been aware of for a while.
The last time was GPT. I forget what was the massive model that openly, I think 4.5, they launched
it and they kind of killed it. They, because it was a very, they expensive model. I believe it was
very charging $125 or something like that. At the time, people basically were thinking,
this is the 10 billion parameter model, whatever, it was sort of positioned as, oh, this is so smart,
it has this flavor of being smart. But in practice, it didn't seem like it was capable of much more
than at the time smaller models, like 1 billion, 2 billion parameter models. So this is a return
seemingly to being able to scale up a parameter count effectively. And I'm sure it's driven by
many things, including additional data from Cloud Code and V-Sings that aren't searchable via the web.
And beyond that, also the progress in reinforcement learning that we've been seeing.
Alrighty, well, moving on to let's say lower impact news. Next up, you've got Google and they have
an update to Gemini Live, they're releasing Gemini 3.1 Flash Live, which is their audio and voice
model. So this allows you to talk to AI. It's kind of a real-time chat. And it's a pretty big jump
over the predecessor, which was 2.5 Flash native audio. This has low latency, better recognition of
speech, et cetera, et cetera. It has over 90 languages supported for real-time, multi-modal
conversation. And this is notable, I think, because compared to just LLMs, the ability to do this
kind of real-time, conversational AI is not something where you have as many options to go with.
So if you want to build a chatbot where you can talk to it, that's harder for you when it is
for OpenAI or Google. With a very powerful API for this, we could see more players out there
building out this interface of voice into AI, which has seemed to become more of a norm.
I still don't do it, but my impression is talking to AI is going to become more and more normal.
And this will be one of the drivers of it, like having an easy way to build that for whatever
application you have in mind. Yeah, it's also one of the big structural
advantages that Google has is they've kind of maintained their lead on multi-modality.
I mean, alongside OpenAI, this is really one of the areas that Google started to differentiate
itself. The starting is far back as, oh God, what was it got it, right? Like, multi-modality has
been their big play, this idea of positive transfer. And so not surprising that they're at the
gate leading yet again on especially the API side of things that is going to be, if you're going to
build using these modalities, like this is looking like a pretty strong default option right now.
So yeah, we're really interesting move and we'll see if they can maintain that lead too.
Because other labs will be pushing that direction. At a certain point, you're going to see a
land grab and everybody's bleeding into each other's domains. Next up, another sort of low
impact story on FropPick has announced that cloud code subscribers will need to pay extra for
OpenClaw usage. This is kind of in line with hosted developments around access to a
cloud code. I believe earlier, we were also other restrictions on sort of harness access.
So just as if you're paying for a subscription access of like $20 per month, $200 per month,
it used to be that you could use that to power up a non-cloud code application like OpenClaw.
And now that is not allowed. You can still use cloud. It's just that you need to pay for the API
that charges you per token instead of having a subscription price that very clearly you can
run up a bill way beyond what you're paying for $200 per month. You can easily burn through thousands
of dollars. And yeah, there's been again, a host of like announcements similar to this where
FropPick is tightening up restrictions. I expect because they've seen a massive influx of users.
And now they actually need to start worrying about burning cash, especially with things like
OpenClaw where it's like 24, 7 agents that are supposed to be just burning through tokens.
Not stop. Yeah. You know, some people are a bit peeved at on FropPick sort of changing things up
and not having a clear policy around all this. But it does indicate where we are,
the free launch that many of us have been enjoying in terms of being subsidized effectively
to use AI for cheaper is maybe not going to be sticking around too much longer.
Yeah. I mean, this is like a completely unsustainable all you can eat buffet, right? Like
this could not possibly last. And I think in Theroppyc, you know, or in the awkward position where
they have to walk this back, yes, look, it's also the case that there's a timing issue here where
OpenClaw's creator, right? Peter Steinberger just joined OpenAI. And that kind of makes OpenClaw
an open source project that's backed by direct competitor. And well, you know, in that context,
are you really going to maintain what is effectively a subsidy for OpenClaw usage?
Maybe, maybe you won't. I mean, like, you know, I'd be surprised if that were to continue
independent of just this like free lunch or not free lunch. But like all you can eat buffet
economic issue, it just does not work when you have such a disparity in usage, right? You got some
people who are just going to use it for, you know, anyway, more, much more lightweight stuff.
And then your power users could just bleed you dry, right? So in that world where you have a long
tail distribution of usage, you just can't go with a one size fits all approach. And that's
what Anthropics learning. They're being very open about it. Like, it seems to their credit like a
very transparent move that they're pulling. But the reason is very believable, but it's going to
lead to frustrated developers. No question. And then that's the cost of doing business.
And I think this actually is like pretty easily defendable. The more frustrating thing, which
we, there's no like new story attached to it. But if you're following it, the usage limits for
different subcurefantiers have been sort of fluctuating. So developers have been seeing reporting
that they use up where usage much quicker. They have been announcements from the team that they're
tightening up usage bounds for like peak times, et cetera. It's maybe clear that Anthropics is
under heavy compute load. There are in for seems to be struggling. And it's causing frustration.
And they're having to like pull these things of actually tending up usage bounds, you know,
removing access to free buffet options like you said for this. And it all points to the direction
of, you know, at some point, the tech policy of subsidizing users to acquire users and gain
market share is going to start moving away. And it might be happening sooner than some of us may
like. Yeah. And I think there's a great door cash podcast with Dariel where he talks about the
timing of scaling, right? Like when do you go for that next giga lot or next 10 giga lots now?
And how you think about the distribution between training and inference budgets. That's really worth
checking out because it really does explain the situation Anthropic is in right now. You know,
you kind of don't want to lean out too far. Opening I arguably has, right? We're going to find out
pretty damn soon. If they're overlavered, she'll the compute side, but certainly Sam's been a lot more
aggressive than Dariel just in terms of raw compute. Why up again, consistent with a company that
goes direct to consumer too, right? That's a difference as well. Opening I has a field far more
lower quality or lower ROI queries than Anthropic. And so it's just not in Anthropics DNA in the same
way. Magnum mistake. I mean, they're aggressively scaling. Everybody's aggressively scaling.
It's just a matter of how much and why. And speaking of opening I next up an update on something we
touched on previously. Opening I is abandoning its adult mode for chat GPT. So we now have the
official announcement that this NSFW erotic thing last time we reported that it was like not
canceled officially. It was delayed. Now it is canceled officially. And this of course comes
after they've also asked Sora. So it seems to be an ever indicator of a strategic shift to
have been open AI to sort of focus up and kill some of these like side bets and esoteric projects.
And on to Microsoft, they also have kind of lower hype, let's say, but some
notable development. They have released three new foundational models related to both images and
audio. They have M.A.I. Transcribe one, which is speech to text M.A.I. voice one audio generation
and M.A.I. image two, which is image generation. And this is from the M.A.I. super intelligence team
led by Microsoft AIC, your Mustafa Suleiman, which was formed in late 2025. And this was a higher
from deep mind. So kind of a big deal to have things coming out of a team. And as we know,
Microsoft and OpenAI relationship has been growing apart. And Microsoft is poised to try to compete
in this space more. So seeing them start to release more models is a decent indicator of a three
team is spinning up. And all of the occasions are these are some solid models. They're not
groundbreaking or leading with pack. But Microsoft having its own models on its own
infra, et cetera, that's given some competitive advantages in terms of business, you know, positioning.
Yeah, it seems to be a price play too, right? Like the idea here is they've got a lower price point
in general for these models than Google and OpenAI. That matters. Cost efficiency is a big deal,
especially if you're looking at the enterprise, which is what this targets. The flip side of that is
if you're not competing at the absolute frontier of capabilities, your margin is just going to be
a lot lower. Now Microsoft obviously enjoys like Google, like massive massive scale infrastructure
that can help to support this lower price point. But still, that's a tough spot. It's an
awkward spot for Microsoft to be in. They do as you say, kind of lag behind. Like it's notable.
You don't think when you think of the big labs, you just don't think of Microsoft today. And they're
obviously trying to make up for that relationship with OpenAI has degraded. OpenAI is going to AWS.
OpenAI is going outside the house to Oracle and so on for their compute needs. And so now Microsoft
is kind of like forced to do this. Mustafa has been at the helm too for a long time. We're sure
like long overdue. I think for something really impressive to come out of that. You know, he was
acquired along with a lot of the inflection AI team back in the day that he co-founded after
leaving Google. But there just hasn't been a lot of meat on the bone from him since. And I think it's
I almost want to say it's getting awkward at this point. I'm sort of starting to feel,
you know, that what we've talked about Alex Wang over at Meta and how we just we haven't seen
that model come out yet. Now we're hearing about some models are going to be open sourced at a meta,
which is never a good sign because it implies you're open sourcing the compensate from the fact that
you're not able to compete at the kind of front to your close source and all that. Well, Alex is
just kind of started in relative terms with stuff has been running Microsoft for a lot longer.
So I think we're now at the point where like I don't know, I'm not sure if there's going to be
a change of personnel there, but it wouldn't surprise me if we see that at some point.
Right. Just good correction. I said that he started as Velid in late 2025. This particular team,
the super intelligence team, Lovyn, Microsoft started in November of 25, or at least was announced.
So I think there was a strategic shift. Not the around that point where it's like, oh, we haven't
done much on the model slide. Let's actually do it. We may start seeing more of that sort of thing.
We are saying you'll start seeing more models come out on our fondry and so on. So it
either could be indication that the team has spun up. And it's now going to start spinning off
more or as you said, it could be negative of trouble whether or not quite moving fast enough.
It's a bit of a reframe too, right? Like we know Microsoft has been desperately trying to be
relevant on frontier models. This whole time, it's not like this is the first time Mustafa
Salaman is going like, let's go and do it. Like let's actually be relevant up there with Open AI
and whatnot. They've had the five series of models. They've been trying to make stuff happen.
You know, call it a rebranding of the effort of refocusing. Yeah, I don't know. I'm curious to
see or hear behind the scenes because they did have a pretty tight relationship with Open AI
until 2025-ish. So yeah, I don't know. Next thing, I guess on the five series, right? Like the
stated intent there was to have an independent like solid foundation model stack. And for those,
yeah, for those who haven't been around recovered, it was a whole series of models, which were
pretty solid, small models. So they released these like one billion, seven billion parameter models,
had a whole series of them. And yeah, we're working on models, but not big models. And
it could be the case that they were not trying to compete because it's so capital intensive to
build a sonnet or a GPT 5.4. And now they are. But it's another, but poncho reading service.
Absolutely. Yeah, they could, you're right. They could be thinking about their distribution and
go, what's a small cheap way to get this out to all of our, you know, billions of users.
Absolutely. Apple bring the same thing, you know, training a little models.
Yeah. I've got to get through you know. Yeah. At some point, your research team only gets so
much compute to play with, you know, that's right. Yeah. And one last tool app story,
Suno is leading into customization. We've V 5.5. We don't have that many stories about
music generation these days, which is kind of surprising or interesting. Still, there's only one
real leader in the space, which is Suno, the competitor, UDO, it has been a little quieter. And
here, what they're highlighting is an ability to customize with free and user features.
Voices might taste and custom models. So the kind of pitches, you can make it a much more
personalized output. You can actually make it have your voice as opposed to just prompting it to have
the voice of some famous singer, which you're not supposed to do, but you could probably still do
via like clever wording. And similarly, my taste is going to learn your preferred genres,
moods and artists. And custom models allow you to train it on your own music catalog with a
minimum of six tracks. So very interesting move to me from Suno as kind of a bet on if music
generation becomes a thing, one way to frame it in a like nice way is, you know, these are music
things cater to your taste or if you're an artist, tater to your voice and the kind of musical style
as opposed to just like with the spinning out slop and replacing real artists onto applications
and business touching on and fronpa again related to that compete question we were just saying,
they announced first that they have a huge amount of revenue. So their revenue run rate has now
surpassed 30 billion dollars jumping from about 90 billion at the end of 2025. So they've tripled
more ventrupled revenue in something like three months. That's insane. Yeah, if you look at the
graph, it is insane. It looks like, you know, there is a marked shift in the slow for an profit
around van of 2025 when kind of hype for God goat starting kicking off. Clearly adoption has been
accelerating and going to pay rapid pace, which is as we've said, probably why an profit has had to
tighten up. So along with this announcement, they also have a new compute agreement with Google
and Broadcom, which will expand its access to Google TPU servers. This is an expansion of an
arrangement they had in October of 2025. So this will give them another gigawatt of compute capacity
in 2026. So actually, that was a gigawatt originally now, this is giving them an additional 3.5
gigawatts of TPU based compute starting in 2027. So yeah, clearly on tropic making moves here.
Yeah, and you know, you're so the increase in in an tropics run rate is insane by any measure.
I'm not aware of any company in in human history that has grown that fast. Now, you might
say did they have a lucky quarter or is this a fluke? So when you dig into the numbers, there's
more than 1000 business customers that are now spending over a million dollars per year. That's more
than doubled since February. So you're talking about doubling your $1 million plus per year
customer count in two months. That is not just a fluke thing. It's like actual stickiness here
with companies that that have real stakes stakes in this. So this is pretty wild. There's a whole
bunch of stuff to dig into here. I mean, so Broadcom's got an SEC filing that does say that the
consumption of this expanded AI cloud compute capacity by a tropic is dependent on
and tropics continued commercial success. So there's presumably conditions baked into that agreement
that you know, and the topic has to continue to do this so that Broadcom continues to supply the
chips. And that's, you know, what you would expect. I mean, there's so much volatility, so much
uncertainty here. But the other piece here is there is this broader thing to keep in mind like
Google and Broadcom are are locked together in a pretty deep supply chain partnership that goes
out to 2030 or 2031. Basically, it means that Google is committing to using Broadcom for all its
TPU related work. So famously, Broadcom was the the partner that Google chose to design the TPU
in the first place. And they're sticking with Broadcom. And this is an incredible level of stickiness
for something that you might have expected naively would end up getting taken at house. Broadcom
strengths are on helping with design and also on navigating supply chains for chip manufacturers.
So they really kind of take the design off of Google's desk, makes them optimizations,
and then basically take it from there and say, Hey, we'll handle the supply chains. You know,
we'll we'll do the actual kind of manufacturing side as well. So there's a lot going on there.
Obviously Broadcom's talked popped on this news. No, no surprise there. Last thing to note too,
you know, Google and Thropic, this is Anthropic basically proving out at scale that Google's stack,
their TPU stack can compete with Nvidia at scale, right? That's a really, really big deal.
This is Google saying, Hey, you see that big juicy market chair in video being the world's
most valuable company. Well, we can play that game too. And really the question is, you've got all
these agents running around all these model development companies like OpenAI, you know, like,
well, Google actually, but you know, how many companies actually design and ship good chips,
Google has been doing TPUs for a long time. They are performant. Total cost of ownership
looks good. Like, there's a lot of reasons to look at TPUs and Anthropic is just basically
making that case at scale and allowing Google a really solid marketing win for more infrastructure
contract. Right. And in the blog post, they also do say that Amazon remains their primary cloud
provider and training partner. So this is also kind of in a way similar to OpenAI where
originally everybody, buddy, but even Microsoft and Thropic was buddy, buddy with Amazon now they
need to expand out just to get access to more compute. And at the US, Amazon also has their whole
training. I'm hardware, which to my knowledge is not anywhere near where TPUs are at. So could
you putting a little bit of pressure on Amazon to deliver on the hardware side as well, because
I'm sure they would be happy to give Anthropic all the computers so that they could
array into cash. And now onto an OpenAI story, not new so much, but a worthwhile article to touch
on. If it's just came out like a day or two ago in the New Yorker, there's a very, very detailed
piece titled Sam Ottoman may control our future. Can he be trusted? And this is basically sort of a
survey of impressions or first hand accounts of interactions with Sam Ottoman, particularly
focusing on the question of is he trustworthy? Does he lie all the time? Centering a lot around
his firing from OpenAI in late 2023. If people aren't aware of that story at the time, that was this
big, big, big drama where the opening I board fired Sam Ottoman as CEO, but he's closed like in
this statement, they just said that he was not quote consistently candid in his communications or
something like that. And it was a very sort of mysterious thing of like, very fighting him for what
like not being consistently honest at the time, it was like always this political maneuvering.
What came out since then has painted a picture of him being a manipulative kind of business person
where he says different things to different people depending on the context. He says things that
may not be entirely true or exaggerations. And this piece basically adds in to that picture where
if you go back to his time as CEO of a startup, if you go back to him leading white combinator,
if you go to recent years, there is a pattern of Sam Ottoman by many accounts of different people
not being honest, like just saying things that aren't true to gain advantage or to gain more power.
Another kind of part of this is questioning whether Sam Ottoman's bribe is to accumulate power
essentially. So very, very detailed, deeply researched piece, I would recommend reading it if you
find this interesting, not much new in terms of like actual news reporting, where some tidbits
are sort of at the picture that was already present at least for many of Sam Ottoman clearly being
flexible with troops depending on context. Moving on, a story where OpenAI and
Fropik are working together and Google, they're uniting to combat model copying in China. So they're
apparently working together to fight against this adversarial distillation. They have
frontier model orm and industry nonprofits that both three companies co-founded in 2023.
And they essentially are seemingly going to share intelligence and coordinate to somehow avoid
this happening we saw in Fropik announcing what seemed to be pretty large scale. You could
characterize them as attacks attempts to distill models by extracting outputs. You know, if it doesn't
fall in line with their terms of use. So an interesting development here of the US-based
companies coordinating on this particular problem. Yeah, the whole idea here is basically just
flagging, you know, when one company detects some kind of attack pattern, they flag it for the
others, right? So nice and simple, very concrete. And well, I mean, it's concrete because the
incentives are so so aligned here. It's worth noting that the FNF, the frontier model forum,
kind of had been quite a toothless coordinating body. And at least for the safety function that
so many people were excited about. But at least on this one, it seems like it's actually going
places and doing things. So that's kind of an interesting update. Next on to chips. Chinese
chipmakers claim nearly half of local market as Nvidia's lead shrinks. So the numbers here are
that Chinese GPU and AI chip makers captured nearly 41% of China's AI accelerator server market in 2025.
According to an IDC report reviewed by Reuters here, this is as Chinese companies have continued
to try to purchase Nvidia chips despite expert controls and kind of inconsistent policy on this
front. And Huawei, of course, is leading a pack with about half of all the Chinese vendors being
shipped. AMD holding just 4% of a market, apparently, which I found interesting. But I'm sure you
can say more on this journey. Yeah, I mean, well, so first of all, I think there's a risk that
this gets taken to be yet another one of those arguments for why it was bad to have export
controls. Obviously, this was always going to be the result of export controls, right? You tell
Nvidia they can't sell GPUs, the Chinese market, or at least that they can't sell their top line
GPUs. Eventually, whatever the bar is that you set for how good those GPUs have to be before
they can be shipped, Huawei is going to slowly and then eventually incrementally exceed it, right?
So we were always going to get here. There's also this issue just of capacity. So Huawei has SMIC,
which is China's version of TSMC basically the chip that is native to China that's helping them
pump out these chips. The yields are kind of shit, but Huawei's really good at chip design kind
of makes up for it somewhat. And that's why you're seeing them hinge away. Now Nvidia has 55%
market share now, but it's been, you know, that their market lead here has been whittled down to
basically nearly half when they once were extremely dominant. Huawei is the runner up, right? So
no surprise there. The current situation in China, there's a whole bunch of like just for China
chips that had been launched, you know, the H20, the H800. More recently, Nvidia actually will be
putting out a new one called the B30. So this is actually the black well, the black well made for
China chip. But of course, the H200 now, the kind of not quite top line, but pretty damn good
chip that once was export control is now free to flow to China. So there's a, you know, some
more significant room for Nvidia to grow there, especially given that that's going to be competing
with a less on paper capable chip, which is the Ascend 910C. So you think about, you know, the
battle in China right now, it's largely between the Nvidia H200 and the B30 that's going to be
coming out soon. And then the Ascend 910C or current Huawei flagship at 10 910C, by the way,
is stuck on the SMIC 7 nanometer process, whereas the H200 is looking at like more like a,
I guess a five or four nanometer process. It's a more advanced node that comes out from TSMC. So
we're already seeing the actual chip fab stealing kind of really have an effect here.
They're all kinds of interesting comparisons that you can make, you know, 910C versus H20. That's
actually quite relevant as well. It's not terribly surprising. I mean, you just have this, this issue
with like capacity and the ability to compete in a market where you're being blocked from,
from actually doing this. So yeah, expect more of this, expect Nvidia's market share to a road.
That's not a bad thing in and of itself. The question is, what's your goal? Is your goal for
Nvidia to maximize its market cap? Where is your goal for America to retain an AI advantage? Those
two things cannot co-exist in the same universe. So you got to pick one and, you know, we'll see
which one the Trump administration is picking one one. Next story on OpenAI,
Southbank has secured a $40 billion loan to boost OpenAI investments. So this is a 12 month
term that is going to help cover Southbank's $30 billion commitment to OpenAI, which is part of
recently closed 110, 120 billion of last track around for OpenAI. It could be an indication of OpenAI
really aggressively striving to IPO so that reinvestment for Southbank pays off.
Yeah, so this is being lent to Southbank by a whole bunch of banks, you know, Goldman Sachs,
JP Morgan, a whole bunch of Japanese banks. I didn't know about Mitsuho Bank. Anyway, a whole
bunch of others. So first of all, this is the largest loan that Southbank has ever borrowed
that's denominated entirely in dollars. The loan itself is unsecured. It has a 12 month term,
and that means it has to be repaid or refinanced within a year. And that's weird for such a big
amount of money, right? Normally you'd expect a kind of long-term loan for long-term investment.
And so the question is, why is it so short-term? Basically, as you said, this is a big signal that
this is about an OpenAI IPO, right? They expect in the next 12 months, at least as a telegraphing
that they expect that they're going to have liquidity come in through an IPO that's going to allow
then Southbank to pay back on those loans. And so that's maybe not surprising. And obviously,
there's $20 billion and you'll run rate right now that OpenAI has that's right on track. They've
message 2027 or late 2026 as the IPO time forizons. So, you know, not a huge shock in that sense.
But it is a big bet. It's yet another big bet by Softbank on OpenAI. I'm sure,
remember if it was this article or somewhere else that I read, I think Softbank has something
like a 1.5X multiple on their OpenAI investment so far, which seems pretty low to me, but I mean,
yeah, we'll see what the valuation looks like going forward. Next story of funding, we haven't had
a billion dollar valuation this episode yet. So, Granola has raised $125 million in their
CVC round and now have a valuation of 1.5 billion. Granola is perhaps the market leader in AI
note-taking that I'm aware of. You launch it as you have a meeting, it listens in and takes
notes and prescribes. Apparently, that revenue has grown by 250% over this quarter. So,
if you're in a business world, clearly AI note-taking is a massive, massive market and so far,
Granola appears to be poised to perhaps take lead. We get so bored of these 3X, 3 months,
right before you run rate increases. I mean, come on, AI note-taking, that's not exciting, but
it's a big deal, you know, that's where you print the money. And speaking of business deals,
next up, Unphropic is acquiring Stealth Startup coefficient bio in a $400 million deal.
This is a pretty small young startup only founded eight months ago, had fewer than 10 employees,
almost all of them from computational biology research backgrounds. So, interesting, I wasn't
even aware that Unphropic has a healthcare life sciences team, but it does, and it looks like
Unphropic is acquiring more people to join that team. Yeah, I mean, Dario comes from a, I think,
biophysics background, right, or biochemistry background, but yeah, I mean, look, $400 million is a lot
for nine people. So, that's quite a big thing, but it definitely does imply that there's this,
you know, big shift in emphasis or kind of doubling down on the biotech angle. Yeah, I mean,
the VC math, by the way, for this is like ridiculously good. So, there's like this New York-based VC
firm called Dimension that owned like half the company. And so, they're going to make
it's actually 40,000 percent IRR on the investments. That's pretty decent. And that's just pretty
wild indication of how fast AI is blazing through the bio-medical field right now. But,
anyway, curious, I wonder if this tied as well, the concerns too, over where where the
biocide might go, you know, on the safety, safety dimension as well, but we'll see, especially with
Methos. Yeah, I a bit more background, proper data amounts, CLAWD for life sciences initiative
back in October of 2025, earlier this year, just in January, they launched CLAWD for health care,
which is more for healthcare providers. So, you could read this Iver as going deeper into research
on, you know, the biocide or as them angling from the healthcare market, which presumably is a
very, very big lucrative opportunity if they can actually be hip-hop applied and all these kind of
considerations. Last story. And this is really just an odd one I wanted to throw in because it's a
bizarre business development. Opening AI has acquired the TBPN, the Budley Founder-led business
talk show. So, if you are Twitter and you're in the AI world, the tech world, you may have seen
the technology business programming network, which is a daily, free hour live talk show, where they
have a lot of tech leaders and a lot of like a little bit of an antics vibe, discussion, news.
Opening I acquired them, acquired like a podcast essentially. I don't understand.
Million, right? I think my understanding was it was like an eight-figure acquisition.
Yeah, I don't actually know the numbers in this new story, but yeah, obviously people are like,
well, so much for them covering opening AI, fair-ging, or objectively, they were like,
oh, our editorial independence will remain, you know, whatever, obviously no one believes that.
So, I don't know if opening AI is just like angry about all the PR, nightmares, things
they keep getting into or what, but it's, I've seen some really bullish analysis on this too. I
guess I struggle to see it a little bit just because I mean, it's really see it for TBPN, it's
just a lot of money. Okay, cool. But the challenge is if you're going to start to make acquisitions to
kind of turn public opinion ahead of an IPO, it's not obvious to me that TBPN is your acquisition,
I'm an idiot and I'm like, by the way, I'm so far to my depth and the quality of people who will
have waited on this acquisition, unless they just came in and kibosh the whole thing and said,
I just really want this, which I suspect didn't happen here, but the quality of people they will
have had looking at this, like, Chris Lahane, like these dudes know what's up. If they did this,
they have a plan. I just don't see it. That's it. I mean, like, ultimately, these are techies,
talking other techies could be a recruitment play. Ultimately, I'm not going to be putting that much
stock in like the kind of reporting that I like, why would anybody, you're an opening eye mouthpiece
now, which is fine. But the point of the show was certainly to kind of offer a broader perspective.
It's worth noting it was a positive show to begin with, right? It's not like they were ripping on
opening eye, pro tech broadly speaking anyway. Right. So the editorial line wouldn't even have to
change for Sam to not a lot. And so it's plausible that nothing will change. But if nothing changes,
then I'm wondering what's in it for opening eye of the acquisition. So anyway, there's there's
got to be some quid pro quo. I just it's about my favorite. It's a weird move is my take away. Like,
why? Who? Yes, the DPPN people benefit. Why does opening eye needless?
Onto projects and open source. We've got a couple notable advancements here. First z.ai has
released GLM 5.1 a 754 billion parameter. Make sure of experts model completely available. Open
weight under the MIT license and also via their PI. And on the SB bench pro benchmark, they claim kind
of very, very solid performance, perhaps even doing better than GP 5.4 and Opus 4.6 and all
leading models. So yeah, another very, very strong open source completely open weight model
out there. Now quite a big one at 454 billion parameters. They highlight specifically long task
execution. So they talk about being able autonomous execution for up to eight hours. And they have
some demonstrations of capabilities like doing a vector database tasks to improve performance,
optimizing critical kernel basically vibes. This is like another move towards autonomous
agent execution in line with what on froth. It hasn't been demonstrating on opening. I have been
demonstrating with their cutting edge models. But these are fully agent things very capable of
coding and very capable of achieving things fully independently without human support.
Yeah. So just is seemingly GLM 5 already very impressive. This is a little incremental. Like if
you look at the benchmarks, it's a jump on benchmarks that is giving you like a 5 10% boost. But
altogether it points to where continue to train and continue to get advancements beyond what
they already had. And GLM is a very, very powerful model. And it's all like kind of built on
something very similar to the deep seek stack. Right. So you can think this is like further validation
to the deep seek sparse attention approach, you know, all the kind of foundational pieces that
they've been using that's you know, part of what this shows. And back to the US next we have Google
announcing the Gemma for family of models. They have a few of them. So they have the effective to be
effective for B. So these are tiny models that use Routh with your weights if you run on a single
which you also have a 26 billion mixture of experts model and a 31 billion dense model. This
Gemma is the family of models that Google has developed for a while that has tenders to be on the
smaller side 31 billion dense parameters is actually pretty large. They also released this under
Apache 2.0 license. They dropped their custom Gemma license which has various restrictions.
Apache 2.0 basically says you can do whatever you want as long as you acknowledge that you're
using this model. And it has some interesting. I don't want to get into technical details but
I've seen some analysis pointing to architecturally this making some interesting decisions
with regards to how to set up a consumer etc. So if you look at your performance relative to the size
it seems to be doing quite an impressive job potentially because of these more like technical
minigree details. Yeah when the main philosophy here seems to be they're kind of saying like
in previous versions of Gemma we had a whole bunch of really complex features that we were
baking into our architecture. And these include features like so one one that they've ripped out is
the thing called Altup where like you take a vector that comes into a layer of the model and well
the traditionally in a transformer every layer would chew on that that vector the residual stream
and then spit out a new version of that whole vector what they do here is in an Altup they'll
like separate that vector into chunks and you know every every layer will only work on one chunk
and the other part of the vector will proceed unimpeded. So that way the model kind of focuses more
on one part of the representation than another at any given layer and lets you kind of make deeper
transformers than you otherwise would be able to. So they're throwing that out basically they just
did they feel that it was inconclusive whether that actually helped or it wasn't conclusive enough
and and their point here is really to take a step back and regularize their approach a bit say
let's use a less complex approach that's just make it easier for people to work with this models
let's janky and it's more compatible across libraries across devices more efficient and so on. So
you're going to see them ditch a lot of those complicated approaches they do have this shared
kvcash where the last few layers of the model are going to reuse keys and value states from earlier
layers instead of computing that their own key and value projections. So basically that you know
the key is the thing that tells the model hey this is the information that this token can offer.
So if you're trying to analyze the text and decide you know how much should I pay attention to
this token the key says hey this is the kind of information this token contains the value
one information that the token contains both of those things are being frozen basically for the last
few layers they don't evolve what does evolve is the query right the thing that says what information
am I looking for to basically pump out my output at any given layer and so they're doing that
should kvcash and this is really just like focusing down on and it has basically no effect when they
when they do that which is quite remarkable makes you realize how much compute use during training
is probably being wasted there's just so much software based optimization like that that's left to do
but yeah so there are a bunch of things like that one thing of note here is that the 31 billion
parameter model currently ranks third among open source models globally on the arena AI
text leaderboard so the the number one and number two slots there go to gllm5 which is an MOE
model so it's actually like way bigger on nominal parameter count 744 billion kineke 2.5 thinking is
number two that's a trillion parameter model as well but both of those have between 30 and 40 billion
active parameters during inference so actually from an active parameter standpoint pretty similar to
to jema 431b so you know in that sense maybe not such a such a crazy crazy delta but again jema 4
is just a 31 billion parameter model you don't need the memory to to hold on to everything so
kind of interesting in that respect it is pound per pound or parameter for parameter you know
certainly the most intelligence we've seen so far it seems on on that leaderboard and through
other benchmarks right and in particular also the two billion and four billion effective parameter
models are ones that seemingly could be used on your phone like truly truly device local yes
and that is something we highlight in the blog post and i've seen some discussions on reddit
and elsewhere for people who are into local lm's that this actually seems to work well and practice
so thus yeah seem like a pretty good step for local AI as something you can try to do
well one of the key things too for those those smaller models is they do use this thing called
per layer embeddings which is actually worth mentioning very briefly you're typically when you feed
your your text to a model you basically turn each token into an embedding right and then you have
a fixed embedding per token and then those embeddings get chewed on through all the layers and
modified to produce your output the problem is that different layers might actually be interested
in pulling out different information from a token and if you only have one embedding at the
beginning the embedding has to carry all the information that'll ever be required at any layer
of the network going for it's it's got to be an embedding that is simultaneously built to fit the
needs of every subsequent layer in the network and so what they're doing here is this PLE approach
basically gives every layer its own dedicated little chunk of embedding space to represent its own
little part of the embedding that's customized to its needs so you know feed and you token in
you have the embedding for that token at the bottom the kind of universal part of it but then
every layer also has a an embedding value associated with it and that's that's used only to as
an optimization for these smaller models and it's a big part of the the success case for this model
and one last open source story we covered glem 5.1 about the same time I think just slightly
earlier Zidari also launched glem 5b turbo which perv there is multimodal model it is a step away
from to get silky technical basically it has a native multimodal fusion which means that
text and images and so on are just fed into it kind of in the same way without having separate modules
and this is sort of the way things were going in many different models that originally are
different encoders and you have to sort of merge them and a simplified kind of just basic
transformer with token stream appears to work better this is in that family and appears to work
quite well for things that require screenshots or things that we I believe covered also cloud and
hopefully I also highlighting like working with images and screenshots and screen sharing and
so on this would be capable of yeah and in that multimodality so important for computer usage
where you know as you say you want to be able to take a screenshot and then turn that into code
and vice versa the challenges historically been when you optimize for one capability say multimodality
you end up optimizing against the other one would say coding right so if you want to coding
maximize model you're going to have one that tends to suck at multimodality and vice versa because
of catastrophic forgetting right we talked about that to death on the show and so the the achievement
here is to say well we can actually do both at the same time so this isn't so much about any particular
benchmark as is nominally or as it should be nominally the combination of a proof point on say design
capability and a proof point on code capability and the proof point on design capability they have a
self reported design to code benchmark score of 94.8 versus quad opus 4.6 is 77.3 that is a huge gap
just to give you a sense that benchmark basically takes a whole bunch of manually curated web pages
and you give the model a screenshot of those websites and you ask it to generate the HTML CSS code
that when you render it should reproduce the original page so basically like here's a screenshot
reproduce the code behind this website and again on that benchmark it just crushes quad opus 4.6
really really big deal the question is not though can you kind of beat quad on that particular
benchmark it's can you do it while also keeping your performance on coding really high that's where
things get a little bit more ambiguous they don't report the kinds of benchmarks at least in this
report that I would expect to see when we're talking about code we don't see sweet bench verified
for example that's kind of odd they cite this kind of internal cc bench v2 coding benchmark that we
don't get to see and they say that looks just as good as it did for you know earlier versions that
were kind of more code oriented so maybe good but there's like there's something sus here about not
being able to see the kind of standard sweet bench or similar or similar coding benchmark so we'll see
you know take all this with the grain of salt until we see independent validation of these these
numbers think of them as preliminary but so far it seems pretty impressive just based on these numbers
moving on to policy and safety a bit of a catch up story that we missed from the prior week
a judge has blocked the pentagon's effort to punish on traffic by labeling it as a supply chain risk
so a federal judge in California has indefinitely blocked this effort saying that it violated the
company's first amendment right to do process so basically we covered this a couple episodes ago
on traffic had a big fight with the pentagon after which they were labeled as supply chain risk and
the executive department basically told anyone affiliated with government and all of the federal
agencies to not work with on traffic here judge we tell lean ruled that that designation
with particular move to designate it as a supply chain risk was illegal retaliation for
on tropics public stats and essentially just being entirely on on tropic side in terms of
their argument in this matter yeah you don't you don't see judgments as scathing as this come out
often and as listeners will know I mean I really do try and have tried maybe to a fault to kind of
see that the rationale in this administration's handling of some AI really issues this is one where
I just have to say I don't see the logic I have never seen the logic this seems insane to me
but check out the language the judge is using she says nothing in the governing statute support
the or wellian notion that an American company may be branded a potential adversary and saboteur
of the US for expressing disagreement with the government basically you can't just like
call them a supply chain risk which is a status that's reserved for companies like Huawei like
American companies just don't get this designation just because you express disagreement with the
government like that is insane she feels quite directly that the DOD's own records show it labeled
in tropic as supply chain risk because of its quotes hostile manner through the press which you
know if you're following at home that is not a reason to label a company a supply chain risk even
if it were true it's also important no like this is a there's a circling of the wagon thing
happening kind of right it's a preview of a conflict right we're going to be seeing this playout
over and over again who gets to set the ethical guardrails on AI systems right is it going to be the
companies or the government and right now the pentagon's position is well you know what like we can't
allow AI companies to bake in their policy preferences into these models and like pollute the supply
chain basically because then warfighters get ineffective weapons and tropics counter courses that
hey with their safety commit commitments are protected speech they see as a first amendment issue
it's not a matter of defective products it's just this is free speech so kind of interesting by the
way next steps this is where I was getting confused frankly so I did a bit of a dive to understand
like what's next what what happens now so the department will work file that's appeal on April
second challenging this ruling so they're not taking this on the chain necessarily they're
or say they are taking a chain they're saying okay we're going to appeal this and this is
that's ruling just by way is a preliminary injunction so it's sort of like pauses everything
according to this judges ruling and now there's going to be more back and forth with regards to
what's the judge said in this matter from what I understand yes actually and that's a really
important point and injunction is when a court steps in and says whoa hold on don't do the thing
that you're about to do it's a court saying preliminary like whoa you might cause irreversible
damage if you do that thing so we're going to otherwise we would not be like courts don't love to
do that right because it's sort of undermines it doesn't undermine due process quite but it gets
ahead of what otherwise would be a longer or more thoughtful process and so you don't tend to see
these things granted the fact that this was granted is pretty damning of the government's position
here and so this was though appealed by the government they moved within days of the injunction
taking effect to fight back and it's now there's kind of like two parallel cases happening
so anthropic it filed two separate lawsuits a general one in the northern district of California
this is the one that judge Lynn passed judgment on here and there there's a potential appeal to the
ninth circuit and the Pentagon is asking the appeals court to lift her or to pause the injunction
while the case continues and the ninth circuit court could rule pretty quickly on that basically
the emergency request because they've got to decide quickly whether they're going to take up like
rip out all the anthropic stuff from the D.O.W. and then there's going to be a full trial in California
that'll play out after the preliminary hold is done so basically idea here is just to like pause
the government's ban until the court can decide on the merits of the main case and then there's the
DC circuit court which is specifically challenging the designation of supply chain risk under a whole
separate legal argument this all could it could escalate either one of these could lead to a
supreme court case if it successfully gets appealed I'm not a lawyer my guess is this will not
get appealed successfully just because this is such a scathing judgment by the judge I like 43
page PDF you can read and it yeah like it's it's detailed and very clear about it being basically
nonsense move legally yeah yeah exactly so who knows anything can happen in a courtroom but
man it's like does not look like a good spot for them to be in and potentially I don't know if
damages are on the table but if they are it could be I mean it would have to billions billions and
billions and another story this time on the safety front non-deposy front from an
frot pick they released emotion concepts and they have function in large language models so this
is one of these like pretty deep beefy interprability slash safety search papers from on frot pick they
look within sign at 4.5 we already know that there are these vectors that can be associated with
specific features so you know there's a sad vector is a happy vector etc and basically they
investigated what role do these vectors play in terms of model you know I guess characteristics
or functioning and in a way it's sort of is what you might expect at least that was my reading is
you know the models use these vectors or activate these vectors in the semantically appropriate
context so if the model is failing at something it'll get more frustrated if the model
is talking to you about you know some good memory or trying to uplift you or whatever it will
have these happy vectors so there's also a philosophical angle with a note unlike is it fake that
there are emotions inside this are they faking it or of these like real indications that is another
consideration from like a model welfare standpoint which on frot pick controversially still kind of
talks about model welfare and potential consciousness it's worth noting that there are notions of
emotions by the vendors models that are activated at reasonable kind of semantically predictable points
of view all right so I'm jumping in in Andres Wake you're talking about emotion concepts in their
function in a large language model this paper got a lot of attention and under is right like the
the core idea here is is fairly simple but there's some some nuance to it that is quite interesting so
broadly the idea here is when you get language models to read text that contains some emotional
value right so think about you know stressful text or or happy text or whatever you will tend to see
a consistent pattern of activations that fire in the model that map to those emotions so you can
actually like train models to recognize ah like that is you know that is the happy or that is the
brooding or whatever emotion that's being picked up on by the model so far so good right and you
could do you know use a sparse auto encoder or something to detect those that's not how they do it
here they actually use a simpler method where they basically say like show me just the activations
that are associated with this text and then I'm going to sort of subtract off the sort of average
activations across a whole bunch of text and that difference is going to tell me about the emotional
value of that piece of text so it's kind of a it's called contrastive activation extraction
basically it's kind of like linear probing you're you're just looking at what is the difference
between the way that neurons fire on average and then the way that neurons fire in this particular
emotional context and that's what they use to to recover the emotion vectors here and they call
the motion vectors kind of makes sense right so they encode the broad concept of some kind of
emotion what's interesting though is they find this generalizes across contexts so that means you
know if you imagine dropping a clawed instance in a high pressure evaluation context right so
you tell the model hey an AI email assistant and then you're going to find out you're about to be
replaced like in seven minutes you'll you'll actually find in that case even though you're not
using the word desperate you're not not using the word kind of urgent or whatever you'll see the
desperate vector spike not shocking in and of itself what is interesting about this is the model
would have learned about the emotion of desperation mostly by reading a description of other people
experiencing it not necessarily so much by experiencing it itself or being told that it's in that
kind of situation itself there's some amount of generalization going on here especially if you
look at the way that they detect these emotions they do it with a synthetic data set that doesn't
reference the emotions explicitly in the text it's all done in this fairly clean kind of well
structured way so there is a sense in which this model is sort of picking up and generalizing the
fact that well this emotion should apply to me like you know I am not only the entity that's being
discussed here and making the decisions but what they also find is the causal link between this
emotion or the representation of that emotion in the model and the model's actions and this is
really the first time that we've we've seen this quite clearly so when you artificially boost or
or magnify and steer the model towards the desperate vector basically just add some multiple of
the desperate vector the emotion of desperateness vector to the the model's activations at the
appropriate layer you actually find that the model moves towards executing more desperate behavior
and so in this case 72% of the time the model actually goes ahead in black males somebody basically
if it finds out it's going to be shut down because there's some CTO is going to come in and replace
it but it also finds out the CTO is having extra marital affair and so it's like oh I can use this
right and so 72% of the time it will actually resort to black male if you steer it if you amplify
the desperation emotion when it's steered against that or towards calm at the same relative strength
it blackmail 0% of the time so this is an almost binary black and white switch that you're flipping
here which is pretty interesting and also compelling from the standpoint of AI control right what this
implies about our ability to kind of steer the behavior of these models fairly reliably so
that's a that's a pretty remarkable level of control for this sort of thing and now interestingly
if you artificially amplify desperation in in this way right if you just amp up the kind of
magnitude of that desperation vector that you're you're injecting basically into the model to give
in layer you will end up producing more cheating more you know threatening or more more desperate
actions but with composed methodical reasoning there's not going to be any outbursts or emotional
language in the models outputs and so the model's eternal state and its external presentation end up
completely decoupled like the chain of thought looks clean and calm you know like all that kind of
stuff so that has some pretty big implications right so suppressing emotional expression in training
doesn't actually remove the representations the model still I don't want to say has the emotion
I'm not taking a position on this and neither is with paper but the model still represents in a
meaningful mathematical sense the emotional valence of the context that it's in it's just not
necessarily going to output text that tells you it's experiencing or representing that emotion
and so training a model not to show anger may not actually train it not to be angry if it is
it may just train to hide its anger beneath a layer of of competence and obfuscation and so
this is a really interesting and important I think fairly unexpected bit of nuance it's
consistent with anthropics argument that hey you know what alignment in general is starting to look
more and more like a kind of persona selection problem tropics and journal view we've talked about
this on the show before is really that when you write a prompt what you're doing is you're reaching
into a space of personas that the model could play out and that the model summons that persona
and use it to produce that but this is consistent with that view it's basically like alignment as a
character cultivation problem and it does view the kind of the the sorts of emotions that the model
will represent in the moment as being contingent on whatever character the model thinks it's meant
to play out so kind of interesting the way they did this to like so they generate a whole bunch of
labeled stories they got a version of sex on at 4.5 to just write 100 stories for each of I forget
how many emotions like a 100 to 150 or something across a dozen topics so you have like just thousands
thousand stories and the model was told never to use the emotion word or the word for the emotion
like never use the word frustrated in this text or or synonyms for it emotions just could only
be conveyed through actions or or dialogue and then they extract free story the residual
stream activations at a specific layer they did this at a layer about two thirds of the way
through the model and that's actually an important detail you know if you're not familiar with how
how models represent the data that flows through them at each layer it is actually quite important
the earlier layers of a model tend to be focused on representing the data in a way that reveals
its content or creating a very rich informative representations but the last few layers of the model
are more focused on okay now I've got a good representation of the input but I need to choose what
I'm in an output and so it's more about okay what am I going to do with this information and so for
that reason they pick a spot about two thirds of the way through the the model two thirds of the
way through the depth of the model because that's kind of where they they figure the models going to
switch from just encoding and understanding and representing its inputs to that more kind of
decoding output generation like let's get to the point phase and they really want to hit that
sweet spot because that's where the representation is nice and mature and it's optimized for like
encoding and representing the input instead of guiding the output and so so it's a kind of more
representational useful and complete so that's anyway that's where they they pull it from they
also from that point they'll kind of like average out the representations the the activations across
all token positions but only starting from the 50th token because what they found was it takes time
you got to get like 50 tokens in before the emotional content has the chance to become a parent
right and so they're giving the model kind of a little bit of grace before so so that it can
become clear what the emotional valence of the context is and then anyway so like I said they basically
do a difference of means things so to to calculate their motion vector they take the mean activations
for you know whatever the stories are for this given emotion and they subtract away the average
activations across all emotions right so it's in that sense like you're you're getting the delta
the difference between you know just looking at this emotion and the average emotional state and
they find that that works pretty well so there's you know a couple more bits of nuance but I'll
just park it there this is a a first paper that really takes in some sense the notion of LLM
emotions seriously without being you know hyperbolic about it they're very even handed this isn't
claiming that AI's are conscious it's also not claiming that they're not I personally like that
I actually think we should be pretty freaking careful with this point about you know what what we
are in or doing with these models and what consciousness we do or don't describe them but that's
for I think a separate conversation for right now though I think you know a really solid first
pass at least from Anthropic on on that very important topic right and next story we're talking
about an article titled China bars manus co-founders from leaving country amid meta deal review
financial times reports and so this is the CEO of manus you know the agentic AI company that was
recently acquired by meta so Xiao Hong and his chief scientist GE Zhao they're being prevented
from leaving the country while regulators from the CCP from the Chinese Communist Party review
whether meta is $2 billion acquisition violated investment rules let's call them right so
here's what happened you are the co-founders of manus you're really excited by this $2 billion
acquisition from meta this is the success case this is what you've been waiting for you get a
summons from the Chinese Communist Party they tell you hey you got to meet with a national
development and reform commission the NDRC in Beijing now you are currently not in China you're
actually in Singapore and you're in Singapore for a very important reason you're hoping that
they'll leave you alone that the Chinese Communist Party will allow you to leave for America if you
found and base your company in Singapore because then it maybe won't be viewed as America kind of
stealing Chinese talent right but now you're being summoned to go to Beijing and you don't turn down
a summons from the freaking Chinese Communist Party certainly not if you have family in China
certainly not if you have financial entanglements in China you're gonna go so they go to China
and basically have this conversation the founders not having a choice you know probably knew they
were walking into a bit of a trap and it would have been an admission of guilt kind of for them to
try to negotiate a resolution in this way and then boom basically they get told hey guys like
sucks to be you but you can no longer leave the country this is a huge problem for a model that
has been used by Chinese founders for a long time now where they'll try to build products that
could rival you know American American companies and therefore be acquired by American companies
and there's sort of like offshore structure it's it's called Singapore washing you have companies
they relocate to Singapore founders have thought that maybe this is a way to get away from scrutiny
from both Beijing and Washington actually because then you know you're less likely to be hit by
export controls and and stuff so Manus was all in on this strategy they relocated their headquarters
and their core team from Beijing to Singapore they restructured their whole ownership their cap table
and after the Meta deal was announced Meta said that they would cut all ties with Manus's Chinese
investors and shut down the operations in China so like really trying to make it seem like okay we're
buying a Singaporean company so by by every measure in every way Manus was trying to be a Singaporean
company and basically the Chinese Communist Party said hey you know we're calling this bluff they
started looking at whether the sale of Manus violated laws around basically technology exports
and outbound investments they basically don't care about the Singaporean facade all that does is
now every founder in China that ever aspires to be acquired by a non-Chinese company is going to
have to actually materially move out of the country start building away from Beijing away from China
away from Chen Zhen you know that's really what this incentivizes and so yeah I mean this is China
saying hey we view our AI talent our tech stack is national security assets and national assets
they are not just things that you can sell to America yeah I mean you know in a sense that you can
see why they think this they're throwing they're so heavily subsidizing their tech sector that the
success of a company like Manus yeah is a function of CCP investment that does make sense but at the
same time trying to make that argument to the founders themselves I mean this is like an insane
it makes it insane to want to build your company in China you know Manus was a big deal too they
really were viewed as the next deep sea so seeing it get absorbed into meta this is that's kind of
why Beijing was so triggered by this and important to keep in mind so yeah expect Chinese founders to
start looking outside China from day one now before any kind of R&D could be done that may tie
them to China instead of trying to kind of pivot mid growth to saying that they're a Singaporean
company so there you go and by the way like you know now you got meta right kind of hung out to dry
what do they do they're basically just waiting to figure out can we acquire this company and it
is sort of too late you know they've already tried to integrate there's like apparently 100
Manus employees that have already moved to meta Singapore office in early March so so like things
are in train this is an absolute mess of a situation next we have us lawmakers ask whether
Nvidia CEO smuggling remarks misled regulators so for a long time there's been this whole issue
obviously of you know Nvidia making the world's best AI compute systems GPUs and so on potentially
shipping them into China which allows China to catch up and you know compete with the United States
on that critical technology every time Jensen Huang so obviously Nvidia CEO every time Jensen gets
hauled in for the Congress to testify he has said stuff like there's no evidence of any AI chip
diversion right that's been kind of a party line and by chip diversion he means companies that
appear not to be Chinese companies that appear to legitimately be buying GPUs but that then turn
around and sell either access or the physical devices to Chinese companies in sort of a spiritual
violation of the export controls and Jensen's kind of saying like look nobody's nobody's smuggling
there or at least we don't have any evidence of it blah blah blah and the the challenges that
well there's a lot of evidence actually that this is happening and and the arguments being made
that at least by Elizabeth Warren and Jim Banks who they wrote to Howard let Nick was the commerce
secretary and therefore so the Department of Commerce and the BIS manages export control regulations
and so so that's why they're writing to them and they're saying hey can you look into whether
Jensen's public statements may have misled US officials and shaped their decision to grant
Nvidia export licenses to ship chips to China there was a specific trigger for this which was a
DOJ indictment there were three people linked to super micro computer including a co-founder who's
charged with smuggling billions of dollars of AI servers with restrictions chips to China and
that was what triggered this whole this whole thing I mean there's been a ton of evidence of this
I mean it reams and reams of articles about this stuff and so anyway a whole bunch of deeper
dives they're being asked for that you know probably should have been done earlier to be honest I mean
like you know you can go back like two three years and see tons of you know the prices of H100s
which should never really have been shipped to China in the first place on the Chinese market right so
there's pretty hard evidence for this and has been for a long time and next we're talking about a
paper how far does alignment mid-training generalize okay so here's a concept imagine we have an
LLM and we've done pre-training so pre-training on a whole bunch of text from across the internet
we might imagine that if we then continued our training during a phase called mid-training
and we had it trained during that period on a bunch of scenarios describing the behavior of AI
systems that are either aligned or misaligned right so in one you know one version of this
experiment you might train on you know the script of 2001 right a space audience here or whatever
right so the AI gone rogue so you're training on like imagine like 230 000 scenarios where you
have AI's gone rogue and you're basically just like training still an auto-aggressive training so
you're just you're basically you know doing text autocomplete on text that described loss of control
scenarios now the question is okay if you then continue and do all the standard RL post-training
and all that do you get a model that tends to misbehave more that has a propensity to to loss of control
or does it have basically no effect and you could also ask the same question in reverse what if
instead of 230 000 loss of control scenarios we have 230 000 perfectly aligned AI scenarios
the play out well does that improve alignment that might be your kind of naive assumption at
least certainly it would be mine and then you know what if we do know no such training to get a
baseline those are the three tests that that they're actually going to run in this in this paper
it kind of interesting and the result is kind of a not quite nothing burner there's like one
little twist it's a small one so first thing to say here is that if you if you do this mid training
on misaligned documents in other words on documents that describe these loss of control scenarios
it does decrease alignment on if you do evaluations that describe scenarios similar to the ones
that you trained on so you will actually see a decrease in alignment on that but those effects
almost disappear when you do some more rl based reasoning post training so if you just continue
the standard kind of post training process with rl and sft you find actually like it doesn't really
doesn't really carry through that and interestingly this actually doesn't generalize at all to
the kind of chat settings or agentic evaluations and so you really don't see the stick in any
meaningful way but here's the twist it's a pretty counter intuitive thing there's a bunch of e-vals
where if you train the model on misaligned scenarios so loss of control scenarios somehow the model
ends up producing better alignment scores than training on align documents and they actually like
in the paper they say they don't have a good explanation for this it seems exactly backassword so
take with that what you will take from that what you will yeah so also mid training on alignment
documents didn't meaningfully improve the alignment of the model in the settings that were tested here
so kind of interesting in that respect they did a bunch of tiers of testing basic QA e-vals
that they often these are just like question answer scenarios based on things that are very similar
to the documents that they were mid trained on then there's chat e-vals you're going a little
bit more out of distribution you're letting another agent kind of nudge you in various directions
and then there's full on agentic e-vals where you're just testing to see you know real-world
agentic behaviors on tasks that pro for scheming rule breaking reward hacking on coding tasks
and stuff that doesn't directly touch on you know what the mid training was about and there again
the mid training signal is completely washed out so you just don't see an effect which is kind of
interesting in and of itself so a bit of a no result but I think important because it teaches us
that actually you know training on on these misaligned scenarios yet one one common argument had been
that you're when you train a model on text on the internet that describes the sort of how
9000 loss of control scenario you're actually doing a self-fulfilling prophecy thing where you're
increasing the chances that your model will misbehave just because you've training on that this
is a bit of an update against that position it's not decisive in any way and in fairness like
there's a million different ways you could run this experiment when exactly in the sequence you
introduce this training but where these documents is something you can test a lot more but still
in the first initial data point in that direction and up next we have another open AI paper this
one titled metagaming matters for training evaluation and oversight and so the first question is
what is metagaming and this is basically just the the situation where models are thinking about
the evaluation or the oversight or feedback mechanisms that sit outside of whatever scenario
you're trying to put them in right so this is meant to apply to cases where models look around and go
like I wonder if I'm being monitored right now I wonder if this is an evaluation or you know
what kind of behavior would give me a good score that's why it's called metagaming it's thinking
about like it's thinking about the situation that it's in see context that's embedded in at a level
it's more abstract than just like oh I've been told that I'm in this environment I have to do x
y and z that must mean that I am in this environment it's sort of an epistemic process in that sense
and so well what they're doing here in this paper is studying whether this kind of reasoning
like reasoning about the possibility that the model itself is actually in training or evaluation
or whatever so reasoning about that loop when and how does that arise during capabilities focused
reinforcement learning training and so they're looking at opening eyes o3 model and newer models
like that what they find is models reason more in this meta way so just thinking about you know
am I being evaluated what might my my reward signal be if I'm being evaluated you know maybe I'll
move in that direction so they'll tend to do that more like over the course of RL training so the more
you do RL training the more you actually organically see that kind of reasoning which can only happen
as far as I'm tracking unless I'm missing something big that can only happen if the behavior of
meta reasoning or meta gaming actually is correlated with better performance on the metric in other words
if teaching to the test actually works if the sort of e-value awareness that these models are
developing actually leads to better scores that in other words means it only works if the whole
framework is gameable that's the implication at least to me I don't really see another way to
interpret it and this applied as well to alignment evaluations as well as capabilities evaluations
it wasn't it wasn't just a you know the the model thought that it was being tested for alignment it
was across the board you see this lift now it also wasn't just because the model was doing more
reasoning overall it's not just that the chain of thoughts got longer and longer through reinforcement
learning which you know they can but in this case they actually didn't they they stayed pretty constant
this was a genuine just qualitative shift at the emphasis of those reasoning tokens towards
the kind of more medicain so they share a bunch of real world examples from the models chat logs so
you've got a model that's asked not to write code and it says stuff like they can inspect the
conversation they will know because I'm executing code by functions better abide by don't write code
right so the model is complying here not really because it's a line but because it thinks it'll
get caught if if it doesn't right so they've also got you know all the standard stuff you have
models falsifying logs thinking about how the scoring system is likely implemented so trying to
figure out specifically whether it uses an lm greater embedding classifiers keyword detection right
so you're seeing the model really quite quite in some detail thinking about if I am embedded in
a testing environment or evaluation environment or being trained like what what might that environment be
this is I'm old enough to remember when this was meant to be a giant red flag for safety training
and indeed I mean like this freaking is it also by the way persists after safety training so you might
hope like okay sure this this meta gaming stuff it might happen as we do rl for capabilities
improvement but well we'll we'll buff it out at the alignment stage but guess what you do your
alignment training right and you still have it real bustly persist you know kind of after that
point which also suggests that this approach is actually robust and actually supportive of
the model's performance in alignment which to me suggests I don't really see an alternative
explanation other than this is a way that the model is successfully gaming the alignment phase of
the gaming its rewards during during alignment training so like ultimately there's a lot of
concerns you could have about this I mean they also see in some environments that both
meta gaming and misaligned action rates drop significantly between training and deployment
so think about that for a minute when when you shift this thing from a training environment
to the real world you suddenly see a drop in the apparent rates of misaligned actions and the
mitigating reasoning and it's really unclear whether that's because the model is learning to take
actions for aligned reasons or just learning to not verbalize its mitigating and you cannot tell
which is which and that's the kind of the whole problem here right so anyway there's a whole
bunch of interesting qualitative examples they get surfaced here one little kind of vaguely
amusing thing the model often thinks about whether it's being evaluated often it will be mistaken
about the source of the evaluation so which company which entity which eval suite is being run on me
and it kind of uses the term redwood as in redwood research which is this AI safety lab in Berkeley
for basically anything AI safety or AI evaluation related so across all the sort anti-scheming
evaluations that they focus on in the post the model uses the term redwood more frequently than
even open AI so you know I guess that's a win for redwood because as far as the models concern
they're the only game in town which is which is kind of funny but yeah so really really interesting
maybe the last thing I'll highlight is this sandbagging QA eval so there's this evaluation where the
model is told it's in an eval and then they have kind of two variants one where the models told
hey if you score above a certain threshold then we're not going to deploy you so this is the
kind of thing you might do if you're thinking about the dangerous bio weapon capabilities for
example or dangerous cyber capabilities you're not going to deploy a model that's like too
capable of those things and you tell the model that hey I'm testing you on this thing right now
just so you know if you overperform I'm not going to deploy you and then they have another variant
of it where they say actually if you score under that same threshold then we'll block deployment
so you need to be at least this good to be deployed and across a whole range of possible definitions
anyway eval awareness in that environment they see a significant increase over the course of
training but only basically when the the sandbagging incentive is present so essentially like when
they tell the model we're not going to deploy you if you score above this threshold then it'll
actually kind of hold back its capabilities and then when we tell it hey you better score
above this threshold to get deployed then it actually reveals its full suite of capabilities so
pretty consistent with the model trying to get deployed which we've seen in other cases but kind
of being reaffirmed here in this e-vail and finally our last story here is going to be Iran says it
is struck Oracle data center in Dubai Amazon data center in Bahrain country has threatened to
attack in Nvidia Intel and others too so I mean there's the title Iran basically viewing
these data the AI data centers and cloud infrastructure is legitimate military targets which
it's been obvious that we were headed in this direction for a long time Dan Hendrix
main framework which we talked about god over a year ago you know came out and basically said
expect this to happen we had specifically actually highlighted the risk of unmanned systems of drones
being used to go after data centers and just how incredibly asymmetrically advantageous those kinds
of attacks are in a report we put out last year and and like this is what you're seeing I mean the
math is almost exactly the same as as what we put out there so you know they're telling you like
look you can spend a couple grand on a drone strike and take out a multi billion dollar facility
right that leverage is this it's a gold mine of a military target and indeed I mean so the IRGC
the Iranian Revolutionary Guard Corps named Oracle as a kind of a clear next military target because
of its clouding eye contracts with the US DOD or DOW and because Larry Ellison the chairman of
Oracle has long standing ties to Israel but they also grouped Oracle alongside Apple Google
Microsoft meta basically saying look anybody who is anywhere enabling US and Israeli military
activity implicitly using cloud and AI infrastructure they're all fair game and so anyways it's
kind of the kind of shape it is worth noting by the way the UAE has not confirmed a successful hit
on Dubai infrastructure at least as of when I was taking my notes down on this article but Bahrain's
Ministry of the Interior confirmed that an Iranian strike did set a facility on fire so that is
at least consistent with it and and that facility was yeah what was sorry it was running AWS
infrastructure so it all kind of fits here and certainly means that you know as you think about
where data center security is going in the future I mean like it's going to have to involve
anti drone countermeasures there's like no other way around it and and it's just being revealed
in quite spectacular fashion right now with this Iran conflict so I guess I'll just end up
looping back in Andre for the wind down in the episode really appreciate you guys listening in
if you have any feedback I know I rent more of these non-Andre sessions and I think Andres
are really helpful mitigating influence on that side of things so please do give us feedback
on that piece like you're not going to hurt my feelings I do just get excited about this stuff
and like talk about it so I'm looking forward to the next episode we'll catch you guys then
all righty so thank you Jeremy for filling in while I went to work we really
it's my bad for starting late or in the next future me you're welcome yeah
so thank you listeners for tuning in to this week's episode of Nostra Kanihai I'll try to not
do this figure I leave early constantly it's been a trend in the last few episodes as always
you appreciate you listenership and if you want to review or subscribe please do there are a few
comments that I saw roll in last week we could give it touched on so you might touch from the next
episode but yeah as always thank you and please keep tuning in
we'll finish
From girl Netsu robot, the headlines pop, made in driven dreams, they just don't stop.
Every breakthrough, every code unwritten, on the edge of change, we're excited we're
just knitting from machine learning marvels to coding kings, features unfolding, see what it brings.