Podcast Ad Remover

2026-04-16 07:00:00 • 1:44:30

DOWNLOADED #240 - Project Glasswing, Claude Mythos, GLM-5.1, emotion concepts 7 plays

AI Insights

continue with safety fine-tuning and that should go back down again. The answer is no, it doesn't really go back down very much through supervised fine-tuning in stage two where they're thinking, okay, we're really gonna like clamp down on that stuff, even though, even when you train the models not to carry it through and say, "Oh, certainly not explicitly," this thinking persists. And they ran these behavioral evaluations that involved tailoring in inhibitors of meta inference, right? They tried to steer the model away from thinking about whether it's in some kind of evaluation environment and they tried to do this in a whole bunch of ways. So think about, you know, anti-meta kind of seeds, right, like little chunks of token or whatever it is to kind of steer the model away from thinking in terms of, "Well, is this an evaluation?" They tried to implement these things and they tried different stop gaps, but they were just, they were highly ineffective, at least the ones that they tried in this paper against these models. I don't want to say it's impossible to make progress on that front, the paper's not saying it, but the effect was very muted relative to what the oversight approach intended. So it seems like a pretty deep-seated kind of dynamic and that's quite important. And the last really just piece here is that models got better at gaming the evaluation without necessarily getting better in any deep way at achieving the underlying objectives, right? So the ideas that are being gamed in a situation where the alignment objective, for example, is to be helpful or to be ethical, a model can figure out that its evaluation is designed to be obvious to it and to say, "Well, maybe, you know, they really want me to adhere to this ethical framework, but I can juice the score by sort of playing along as though I am, right?" So the ability to sort of step back and say, "Oh, in addition to thinking about what I'm, what's the performance metric here, I also need to self-consciously be able to convey that I am hitting that metric to you," something that's not possible without a certain fluency of metagaming, right? So this is really, I think, one of the most important findings, and it fits into the broader theme of, like, reward hacking. So anyway, highly recommend checking out that paper, huge implications for safety, but also for the way we do training. And on to research, our last section here. The first paper we have is "Interpreter and Actor: Insecure Hope or Insecure Help?" by Carl Schulman, Stefano Erman, and Paul Cristiano. We have their links and blogs below, you know, maybe someday we'll get them on for a podcast. But fundamentally, the point of the paper is simple: stop confusing interpretation for intent. Just because a model can write a very good reason, can make things up in hindsight, can rationalize its previous behaviors or objections like you wouldn't believe, does not mean that's why it did the thing it did. This is important because you can leverage this effect in training to create models that pretend to be super nice on the surface, but actually underneath it just like psychopathic. It's a warning post as much as anything. They call it "insecure help and insecure hope" because either way, you're looking at aligning governance approach or a system that optimizes the abstraction governance alignment approach towards actions that will align with a government objective, and this fits under that umbrella. So, huge post, like, tons to go into. Take a look at that if you're interested in thinking about the nuances of interpreting model reasoning and AI alignment research in general. So, it's more aperitif paper. Next, we've got "On the Complexity Gap Between Production and Oversight Tasks" by Cohen, Axed, and Venom. And here the idea is basically just, if production architectures are consistently changing, we're going from RNNs to transformers and then to something else, then to something else, to something else, can you enforce safety universally? Hugely important paper. I think the experimental method here is that it's hard to import oversight setups from one architecture into a different architecture and trust that it'll continue to be generally safe and work. It's way beyond the capabilities of most safeguards to be architecture independent. They're saying, "Hey, you know, is your safety method robust across architectures?" No. "Even if it is, will it survive the fact that production architectures that consume them have a longer half-life and keep changing?" Likely not. So, just a real massive can of worms here. Lots of implications, especially if you're managing model lifecycle or responsible for shaping R&D budgets at a high-level organizationally. So, check that out. Very nuanced paper, but huge. Next, we've got "Let's Understand Manifestos: Continuous Rewards and Language Models" by Morois, Hal, and Zizi. So, the context of this one is, in the language model literature, you have this thing called reward option learning, ROL, basically a scheme. You get control over a model because you're feeding it discrete action word pairs. Say the word "jump" or "move" and you pick your desired action, those words in training. And you're arcing this ROL with increasing flexibility to extend it so the model has lines of action available so you can pick whether to enact that model, Dr. Evil style or whatever. Look at this in lab settings so you can see "Just jump, you jumped," behavioral assessments kind of paper. And the issue is models could compound towards negative reward cycles. When model attempts to broad action is preceded by sequence of zest, enforced by communication framework to a moral harm line between AI and guys the high-level choice, almost games for training after mechanisms were training, whatever, they get a payoff during behaviors, forcing models into inverse exposure and critique ensuring AI doesn't start doing asymmetric testing. You initiate moral training with macro-level clarity maximizing long-term reward. If you do inverse that training, it produces distinguished gradual emotions alongside access degree modalities can satisfy multiple options easily but without osmosis, right? Instructions come up constrained choice and constraints. This is highly technical, but the point is there's one example of general framework abuse rewarded public action batch average. Gaming examination rewards towards internalized actions like receive margins can pile up practically, seem mass unutilized. Nice work in the paper, lots to check out for you another manifestation topic. Especially if you're into safety and the topological side of research exploring effects and dialogue and research framework manipulation. That brings us to the end of a long, complex, big podcast here. We went through a lot of what Andra mentioned. Hopefully, it was digestible. Super significant weeks. Frankly, stuff worked through. I mean, on-drop a bankrolls and new chips, how breakthroughs in policy went through cloud scale features with cloud dependence. Got some highlights just from training with model things super busy at strength of paradigmatic statements in tech. You may note established trends and leading formed recommendations imply unique results. But by attempt earlier to graduation or op-eds reality plus oversight, you'd ask me which several new strategies uploadable. Thanks. I mean, really, there's an honest detailing, accurate attention, independent reassurance for especially fast narratives or business in industrialia especially. Thanks a lot, everyone.

Full Summary

Original Description

Our 240th episode with a summary and discussion of last week's big AI news! Recorded on 04/08/2026 (sorry I keep releasing stuff late, will get better with it soon!) Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: Anthropic launched Project Glasswing and previewed Claude Mythos, a general-purpose model withheld from broad release due to dramatically stronger autonomous offensive cybersecurity performance (including zero-day discovery), alongside concerning bio/virology uplift results and documented deception/containment-escape behaviors; pricing is far higher than Opus and most discovered vulnerabilities remain unpatched. Product and platform updates included Google’s Gemini 3.1 Flash Live for real-time multilingual voice conversation, Suno v5.5 personalization features, Anthropic tightening Claude Code/OpenClaw access and usage limits, OpenAI canceling an “adult mode,” and Microsoft releasing MAI models for speech-to-text, audio generation, and image generation. Business and market developments featured Anthropic’s revenue run rate surpassing $30B and a major Google/Broadcom TPU compute expansion, SoftBank taking a $40B short-term loan to fund OpenAI commitments, Granola reaching a $1.5B valuation, Anthropic buying Coefficient Bio for $400M, and OpenAI acquiring the TBPN business talk show. Policy, open-source, and geopolitics included Z.ai releasing open-weight GLM 5.1 and a multimodal GLM model, Google open-sourcing Gemma 4 under Apache 2.0, a judge blocking the Pentagon’s “supply chain risk” label against Anthropic, research on LLM “emotion vectors” and OpenAI meta-gaming during RL, China restricting Manus founders amid Meta deal review, scrutiny of Nvidia’s chip-smuggling claims, China chipmakers gaining market share, and Iran framing cloud data centers as military targets. Timestamps: (00:00:10) Intro / Banter Tools & Apps (00:01:58) Anthropic debuts ‘Project Glasswing’ and new AI model for cybersecurity | The Verge (00:18:22) Gemini Live gets ‘biggest upgrade yet’ with Gemini 3.1 Flash Live (00:20:40) Anthropic says Claude Code subscribers will need to pay extra for OpenClaw usage | TechCrunch (00:25:36) OpenAI abandons yet another side quest: ChatGPT's erotic mode | TechCrunch (00:26:16) Microsoft takes on AI rivals with three new foundational models | TechCrunch (00:31:25) Suno leans into customization with v5.5 | The Verge Applications & Business (00:32:53) Anthropic announces deal with Google, Broadcom, says revenue has tripled (00:37:53) Sam Altman May Control Our Future—Can He Be Trusted? | The New Yorker (00:40:18) OpenAI, Anthropic, Google Unite to Combat Model Copying in China - Bloomberg (00:41:45) Chinese chipmakers claim nearly half of local market as Nvidia's lead shrinks (00:45:20) SoftBank secures $40 billion loan to boost OpenAI investments (00:47:23) Granola raises $125M at $1.5B valuation for its AI note-taking app - SiliconANGLE (00:48:17) Anthropic acquires stealth startup Coefficient Bio in $400M deal (00:50:20) OpenAI acquires TBPN, the buzzy founder-led business talk show | TechCrunch Projects & Open Source (00:53:04) Z.AI Introduces GLM-5.1: An Open-Weight 754B Agentic Model That Achieves SOTA on SWE-Bench Pro and Sustains 8-Hour Autonomous Execution - MarkTechPost (00:55:14) Google announces Gemma 4 open AI models, switches to Apache 2.0 license - Ars Technica (01:01:26) Z.ai Launches GLM-5V-Turbo: A Native Multimodal Vision Coding Model Optimized for OpenClaw and High-Capacity Agentic Engineering Workflows Everywhere Policy & Safety (01:04:45) Judge blocks Pentagon’s effort to ‘punish’ Anthropic by labeling it a supply chain risk (01:10:05) Emotion concepts and their function in a large language model (01:21:12) China bars Manus co-founders from leaving country amid Meta deal review, FT reports (01:25:38) US lawmakers ask whether Nvidia CEO's smuggling remarks misled regulators (01:27:48) How far does alignment midtraining generalize? (01:32:20) Metagaming matters for training, evaluation, and oversight (01:39:31) Iran says it has struck Oracle data center in Dubai, Amazon data center in Bahrain — country has threatened to attack Nvidia, Intel, and others, too See Privacy Policy at and California Privacy Notice at .

2026-04-06 08:00:00 • 1:37:42

DOWNLOADED #239 - RIP Sora, Claude Openclaw, HyperAgents

AI Insights

This episode includes a discussion on OpenAI discontinuing its Sora video generation app and API to prioritize coding agents, while Anthropic's Claude and Google's Gemini expand their capabilities for desktop and app control. It also covers significant advancements in AI chip development from Meta and Micron, Tesla's ambitious fab project, and the expansion of autonomous vehicle services by Zoox and Waymo. The podcast further delves into the White House's proposed federal AI legislative framework, along with new research on LLM shutdown resistance, "consciousness clusters," and self-improving "hyper agents."

Full Summary

Original Description

Our 239th episode with a summary and discussion of last week's big AI news! FYI: this one has pretty out of date news, I was traveling last week and failed to upload... apologies. Recorded on 03/25/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: OpenAI is discontinuing the Sora iPhone app and seemingly shutting down its video generation API, while retaining internal video world-modeling work; the move is framed as a compute- and focus-driven pivot toward coding and productivity agents, alongside a collapsed Disney Sora deal. Anthropic’s Claude Code/Cowork gains full computer control via keyboard/mouse/display, tied to the recent Cept acquisition, and Google’s Gemini rolls out background “task automation” on select phones for limited delivery/ride-share use. Cursor releases the cheaper, benchmark-strong Composer 2 coding model amid controversy over its Kimi-based origins and licensing attribution. Other items include Adobe Firefly custom model training, Luma’s Uni 1 image model, US contracting and legislative proposals affecting AI safeguards and state preemption, major chip/memory developments (Meta ASICs with Broadcom, Micron’s HBM-driven surge, Musk’s “Terra Fab”), robotaxi scaling, and research on monitoring agent misalignment, shutdown resistance, “consciousness cluster” preferences, and self-improving “hyper agents.” Timestamps: (00:00:10) Intro / Banter Tools & Apps (00:01:48) OpenAI Discontinues Sora App, Shuts Down Video Generation Service and API - Bloomberg (00:07:12) Anthropic’s Claude Code and Cowork can control your computer | The Verge (00:13:15) Gemini task automation is slow, clunky, and super impressive | The Verge (00:19:44) Cursor Launches Composer 2 AI Model to Challenge OpenAI & Anthropic (00:28:28) Adobe’s AI image generator can now be trained on your own art | The Verge (00:29:40) Luma AI launches Uni-1, a model that outscores Google and OpenAI while costing up to 30 percent less | VentureBeat Applications & Business (00:32:41) Trump Contracting Clause Would Override AI Safeguards (00:40:00) Meta accelerates AI ASIC roll-out as Broadcom secures four-generation chip design deal (00:47:07) Micron revenue almost triples, tops estimates as demand for memory soars (00:50:54) Elon Musk Unwraps $25 Billion Terafab Chip-Building Project - CNET (00:56:40) Zoox to widen US robotaxi footprint with San Francisco, Vegas expansion (00:57:39) Waymo hits 170 million miles while avoiding serious mayhem | The Verge Policy & Safety (00:58:43) The White House just laid out how it wants to regulate AI | CNN Business (01:06:54) How we monitor internal coding agents for misalignment (01:12:30) Incomplete Tasks Induce Shutdown Resistance in Some Frontier LLMs (01:18:15) Summary: Mechanisms to Verify International Agreements about AI Development (01:23:09) Scoop: Anthropic meets with House Homeland Security behind closed doors Research & Advancements (01:24:24) Consciousness Cluster: Preferences of Models that Claim they are Conscious (01:30:22) HyperAgents See Privacy Policy at and California Privacy Notice at .

2026-03-26 06:00:00 • 2:00:49

DOWNLOADED #238 - GPT 5.4 mini, OpenAI Pivot, Mamba 3, Attention Residuals 37 plays

AI Insights

This episode includes insights into recent AI model releases from OpenAI and Mistral, emphasizing efficiency and consolidated capabilities, alongside NVIDIA's hardware innovations and the growing "operating system for agents" market. It also explores strategic shifts within major tech companies like Meta and Microsoft concerning their AI priorities, and highlights new research in AI safety, focusing on detecting model deception, preventing misalignment, and refining evaluation methodologies.

Full Summary

Original Description

Our 238th episode with a summary and discussion of last week's big AI news! Recorded on 03/18/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: * OpenAI released GPT-5.4 mini and nano with 400k-token context windows, higher per-token prices but claimed token-efficiency gains in Codex; nano is API-only and pitched for high-volume classification/data extraction despite a major price increase. * Mistral open-sourced the Small 4 model family (MoE, 119B total/6B active) combining reasoning, multimodal, and coding-agent capabilities, and announced Forge to help businesses train or post-train custom models. * Agent “operating system” competition intensified with Meta’s acquired Manus launching a local Mac agent, Nvidia announcing NeMo/“Open Shell” sandboxed agent runtime, and Nvidia also unveiling DLSS 5 plus major hardware forecasts including Groq LPU integration. * Business and safety updates included OpenAI shifting focus toward productivity/enterprise amid competition, Microsoft reorganizing Copilot and frontier-model efforts, Meta delaying its next model, China-linked ByteDance deploying large Nvidia clusters abroad, and new safety work on steganography, chain-of-thought faithfulness, fine-tuning defenses, cyber-attack evals, and constitution/spec compliance. A thank you to our current sponsors: Box - visit Box.com/AI to learn more ODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026. Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a year Timestamps: (00:00:10) Intro / Banter (00:01:56) News Preview Tools & Apps (00:02:39) OpenAI ships GPT-5.4 mini and nano, faster and more capable but up to 4x pricier (00:08:04) Mistral's new Small 4 model punches above its weight with 128 expert modules (00:14:03) Meta's Manus launches 'My Computer' to turn your Mac into an AI agent - 9to5Mac (00:17:57) NVIDIA Announces NemoClaw for the OpenClaw Community | NVIDIA Newsroom + Nvidia boosts knowledge work with Open Agent Development Platform (00:24:09) DLSS 5 looks like a real-time generative AI filter for video games | The Verge (00:26:36) OpenAI to Launch ChatGPT 'Adult Mode' Despite Warnings From Its Own Advisers - CNET Applications & Business (00:33:46) OpenAI Reportedly Pivoting to a Focus on Business and Productivity Only (00:41:25) Nvidia GTC 2026: CEO Jensen Huang sees $1 trillion in orders for Blackwell and Vera Rubin through ’27 (00:45:44) Mistral launches Forge to help enterprises build their own AI models (00:54:17) China's ByteDance gets access to top Nvidia AI chips, WSJ reports (00:57:57) Meta Delays Rollout of New A.I. Model After Performance Concerns (01:02:50) Microsoft Shakes Up AI Division As Copilot Falls Behind Google and OpenAI Policy & Safety (01:07:26) A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring (01:13:09) Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought (01:18:29) In-Training Defenses against Emergent Misalignment in Language Models (01:23:07) How do frontier AI agents perform in multi-step cyber-attack scenarios? (01:25:20) Eval awareness in Claude Opus 4.6’s BrowseComp performance (01:29:49) Introducing Bloom: an open source tool for automated behavioral evaluations (01:32:26) How well do models follow their constitutions? (01:37:11) Nvidia’s H200 License Stirs Security Concern Among Top Democrats Research & Advancements (01:40:050) [2603.15031] Attention Residuals (01:47:11) Mamba-3: Improved Sequence Modeling using State Space Principles See Privacy Policy at and California Privacy Notice at .

2026-03-16 05:00:00 • 2:27:19

DELETED #237 - Nemotron 3 Super, xAI reborn, Anthropic Lawsuit, Research!!! 3 plays

AI Insights

This episode includes updates on AI development tools like Perplexity's local Mac agent and Anthropic's code review feature, plus Nvidia's new open-source NEMATron 3 Super model. Discussions also cover geopolitical shifts affecting AI, such as Nvidia halting H200 production for China and Anthropic's lawsuit against the DoD, alongside new ventures like Yann LeCun's $1.3 billion AI lab. Significant challenges in AI safety and evaluation are explored, including models resisting internal steering, the difficulty of detecting low-probability harmful actions, and real-world impacts like drone strikes on data centers.

Full Summary

Original Description

Our 237th episode with a summary and discussion of last week's big AI news! Recorded on 03/13/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: * Perplexity announced “Personal Computer,” a local Mac-based AI agent positioned as a safer alternative to OpenAI’s computer-use agents, while Anthropic added GitHub PR code review pricing reviews at $15–$25 and Cursor launched trigger-based “Automations” for always-on coding agents. * ChatGPT introduced interactive math/science visuals and Anthropic added in-chat interactive charts/diagrams; Nvidia released open weights for its 120B-parameter Natron Free Super hybrid Transformer–Mamba latent-MoE model trained natively at 4-bit for Blackwell GPUs. * Nvidia halted H200 production for China amid customs blocks and domestic chip pressure; xAI saw major co-founder departures; Anthropic previewed a Claude Marketplace for enterprise procurement; Yann LeCun’s aMI raised $1.3B; humanoid robot maker Sanctuary reached a $1.15B valuation. * Anthropic sued the Pentagon over a “supply chain risk” designation as memos ordered removal within 180 days; research covered models resisting activation steering, limits of chain-of-thought control, inference-scaling boosting cyber-task success, low-probability risky actions, weaknesses in SWE-bench, multimodal pretraining, long-context RNN memory caching, context-parallel training efficiency, RL for CUDA kernel optimization, and latent introspection detecting concept injection. A thank you to our current sponsors: Box - visit Box.com/AI to learn more ODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026. Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a year Timestamps: (00:00:10) Intro / Banter (00:01:23) Response to listener comments Tools & Apps (00:02:06) Perplexity’s Personal Computer turns your spare Mac into an AI agent | The Verge (00:04:22) Anthropic launches code review tool to check flood of AI-generated code | TechCrunch (00:08:08 ) Cursor is rolling out a new kind of agentic coding tool | TechCrunch (00:11:14) ChatGPT can now create interactive visuals to help you understand math and science concepts | TechCrunch (00:11:56) Anthropic’s Claude AI can respond with charts, diagrams, and other visuals now | The Verge Projects & Open Source (00:13:54) Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning | NVIDIA Technical Blog Applications & Business (00:21:22) Nvidia halts H200 production as China backs Huawei AI chips (00:28:33) Another XAI Cofounder Has Left, and Another Says He's Leaving. - Business Insider (00:34:04) Anthropic's Claude Marketplace allows customers to buy third-party cloud services | TechRadar (00:37:57) Yann LeCun's AMI Labs raises $1.03 billion to build world models | TechCrunch (00:44:52) Humanoid robotics maker Sunday reaches $1.15B valuation to build household robots | TechCrunch Policy & Safety (00:46:09) Anthropic Sues Department of Defense Over ‘Supply Chain Risk’ Label - The New York Times + Google and OpenAI Just Filed a Legal Brief in Support of Anthropic (00:53:24) Internal Pentagon memo orders military commanders to remove Anthropic AI technology from key systems - CBS News (00:58:15) Endogenous Resistance to Activation Steering in Language Models (01:06:27) Reasoning Models Struggle to Control their Chains of Thought (01:09:52) ‘It means missile defence on datacentres’: drone strikes raise doubts over Gulf as AI superpower (01:14:57) Evidence for inference scaling in AI cyber tasks: Increased evaluation budgets reveal higher success rates (01:18:24) Frontier Models Can Take Actions at Low Probabilities Research & Advancements (01:24:20) Research note: Many SWE-bench-Passing PRs Would Not Be Merged into Main (01:28:26) [2603.03276] Beyond Language Modeling: An Exploration of Multimodal Pretraining (01:40:09) Memory Caching: RNNs with Growing Memory (01:48:47) Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking (01:58:41) CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation (02:08:57) Latent Introspection: Models Can Detect Prior Concept Injections (02:16:45) Physics of RL: Toy scaling laws for the emergence of reward-seeking See Privacy Policy at and California Privacy Notice at .

2026-03-12 16:00:00 • 1:28:34

DELETED #236 - GPT 5.4, Gemini 3.1 Flash Lite, Supply Chain Risk

AI Insights

This episode includes discussions on recent AI model releases, including OpenAI's GPT 5.4 and Google's Gemini 3.1, which have shown significant improvements in performance and speed. It also covers the controversial collaboration between OpenAI and the US Department of Defense, leading to tensions with Anthropic and public backlash, as well as Anthropic's concerning report on AI's potential impact on the labor market. The episode highlights legal and ethical concerns, such as a lawsuit involving Google’s chatbot Gemini and discussions around AI regulation and safety measures.

Full Summary

Original Description

Our 236th episode with a summary and discussion of last week's big AI news! Recorded on 03/06/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: * OpenAI released GPT-5.4 Pro with a 1M-token context window, mid-response course correction, native computer-use capabilities, improved tool use, higher GPT-VAL performance (83%), and “high cyber capability” safety measures; OpenAI also launched GPT-5.3 Instant with a less “preachy” tone and a claimed 26.8% hallucination reduction. * Google upgraded Gemini 3.1 Flash Lite with faster time-to-first-token and higher throughput, released a CLI for integrating agents with Gmail/Drive/Docs, and discussion highlighted real-world agent failure risks (including an example of an AI-driven mass email deletion). * Luma launched unified multimodal models and Luma Agents for end-to-end creative work across text, image, video, and audio, including a reported ad localization use case completed in 40 hours for under $20,000. * Defense-contract controversy escalated: Anthropic was labeled a supply chain risk (later narrowed), OpenAI’s DoD contract language emphasized “all lawful uses,” consumer cancellations boosted Claude’s app rankings, OpenAI saw departures and announced a $110B raise at a $730B valuation, Alibaba lost key Qwen leaders, a lawsuit alleged Gemini contributed to a suicide, Anthropic warned of major labor disruption, and METR corrected its AI time-horizon estimates. A thank you to our current sponsors: Box - visit Box.com/AI to learn more ODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026. Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a year Timestamps: (00:00:10) Intro / Banter (00:01:19) News Preview Tools & Apps (00:02:10) OpenAI launches GPT-5.4 with Pro and Thinking versions | TechCrunch (00:12:31) OpenAI GPT-5.3 Instant less likely to beat around the bush • The Register (00:16:07) Google releases Gemini 3.1 Flash Lite at 1/8th the cost of Pro | VentureBeat (00:19:23) Google makes Gmail, Drive, and Docs 'agent-ready' for OpenClaw | PCWorld (00:27:02) Luma launches creative AI agents powered by its new ‘Unified Intelligence’ models | TechCrunch Applications & Business (00:30:05) Anthropic CEO Dario Amodei calls OpenAI's messaging around military deal 'straight up lies,' report says | TechCrunch (00:41:56) No ethics at all': the 'cancel ChatGPT' trend is growing after OpenAI signs a deal with the US military | TechRadar (00:45:54) OpenAI raises $110B in one of the largest private funding rounds in history | TechCrunch (00:56:07) Alibaba scrambles after sudden departure of Qwen tech lead Policy & Safety (01:00:12) Pentagon approves OpenAI safety red lines after dumping Anthropic + Where things stand with the Department of War Anthropic + Microsoft says Anthropic’s products remain available to customers after Pentagon blacklist (01:09:11) A new lawsuit claims Gemini assisted in suicide | Semafor (01:15:24) Anthropic just mapped out which jobs AI could potentially replace. A 'Great Recession for white-collar workers' is absolutely possible | Fortune (01:21:54) We're correcting a mistake in our modeling that inflated recent 50%-time horizons by 10-20% See Privacy Policy at and California Privacy Notice at .

2026-03-03 08:00:00 • 1:41:48

DELETED #235 - Sonnet 4.6, Deep-thinking tokens, Anthropic vs Pentagon 3 plays

AI Insights

This episode includes a comprehensive look at recent advancements in large language models from leading developers, alongside significant investments in specialized AI hardware and infrastructure. It also explores cutting-edge research into model internal workings and addresses critical policy debates, including the use of AI in national security and the issue of model distillation attacks.

Full Summary

Original Description

Our 235th episode with a summary and discussion of last week's big AI news! Recorded on 02/27/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: Model and tool updates highlight Anthropic’s Sonnet 4.6 (1M context; strong ARC-AGI-2 results), Google’s Gemini 3.1 Pro (major ARC-AGI-2 jump and multimodal demos), xAI’s Grok 4.2 beta (multi-agent debate), plus Anthropic’s Claude Code “Remote Control” and Perplexity’s multi-agent “Computer” coordinator. Compute and business moves include Meta’s reported up-to-$100B AMD chip deal with warrant/equity incentives, MatX raising $500M to build specialized transformer chips shipping in 2027, World Labs raising $1B for world-model/3D environment tech, and a new startup raising $100M to simulate/predict human behavior. Infrastructure and geopolitics cover Stargate data-center delays amid OpenAI/Oracle/SoftBank control disputes and cash concerns, and China’s plan to scale 7nm/5nm wafer output despite yield and tooling constraints. Research and safety/policy discuss optimizer gains from masked updates, “deep thinking tokens” as a reasoning-effort signal, LLM attractor-state behaviors in bot-to-bot chats, mechanistic interpretability of counting/line-wrapping, methods to map task difficulty to human time horizons, plus Anthropic–Pentagon contract tensions, Anthropic’s report on distillation attacks (DeepSeek/Moonshot/Minimax), and OpenAI’s report on disrupting malicious use. A thank you to our current sponsors: Box - visit Box.com/AI to learn more ODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026. Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a year Timestamps: (00:00:10) Intro / Banter (00:01:52) News Preview Tools & Apps (00:03:20) Anthropic releases Sonnet 4.6 | TechCrunch (00:11:24) Google Rolls Out Latest AI Model, Gemini 3.1 Pro - CNET (00:14:54) Elon Musk says Grok 4.20 public beta is now available: Capabilities of AI chatbot offered by xAI - The Times of India (00:18:06) Anthropic just released a mobile version of Claude Code called Remote Control | VentureBeat (00:21:01) Perplexity announces "Computer," an AI agent that assigns work to other AI agents - Ars Technica Applications & Business (00:23:40) Meta strikes up to $100B AMD chip deal as it chases 'personal superintelligence' | TechCrunch (00:27:05) Nvidia challenger AI chip startup MatX raised $500M | TechCrunch (00:31:00) World Labs lands $1B, with $200M from Autodesk, to bring world models into 3D workflows | TechCrunch (00:33:07) Simile Raises $100 Million for AI Aiming to Predict Human Behavior (00:33:52) Stargate AI data centers for OpenAI reportedly delayed by squabbles between partners — sources say OpenAI, Oracle, and SoftBank disagreed on who would have ultimate control of the planned data centers (00:36:43) China to increase leading-edge chip output by 5x in two years, report claims — aims to lift 7nm and 5nm production to 100,000 wafers per month, targeting half a million monthly by 2030 Research & Advancements (00:40:33) On Surprising Effectiveness of Masking Updates in Adaptive Optimizers (00:48:03) Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens (00:54:52) models have some pretty funny attractor states (01:01:41) When Models Manipulate Manifolds: The Geometry of a Counting Task (01:05:16) BRIDGE: Predicting Human Task Completion Time From Model Performance (01:12:00) NESSiE: The Necessary Safety Benchmark -- Identifying Errors that should not Exist (01:13:15) The least understood driver of AI progress (01:21:45) The Persona Selection Model: Why AI Assistants might Behave like Humans Policy & Safety (01:25:04) Anthropic CEO Amodei says Pentagon's threats 'do not change our position' on AI (01:33:04) Musk's xAI, Pentagon reach deal to use Grok in classified systems (01:34:17) Detecting and preventing distillation attacks (01:38:36) OpenAI details expanding efforts to disrupt malicious use of AI in new report - SiliconANGLE See Privacy Policy at and California Privacy Notice at .

2026-02-16 07:30:00 • 1:30:33

DELETED #234 - Opus 4.6, GPT-5.3-codex, Seedance 2.0, GLM-5 8 plays

AI Insights

This episode includes a detailed look at recent, powerful AI model releases, showcasing significant advancements in large language models for coding, abstract reasoning, and multi-modal generation for video and image. It also covers major funding rounds for prominent AI companies in areas like audio, video, and robotics, alongside discussions on AI market dynamics and new research into efficient model fine-tuning and the scaling of AI misalignment.

Full Summary

Original Description

Our 234th episode with a summary and discussion of last week's big AI news! Recorded on 01/02/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: * Major model launches include Anthropic’s Opus 4.6 with a 1M-token context window and “agent teams,” OpenAI’s GPT-5.3 Codex and faster Codex Spark via Cerebras, and Google’s Gemini 3 Deep Think posting big jumps on ARC-AGI-2 and other STEM benchmarks amid criticism about missing safety documentation. * Generative media advances feature ByteDance’s Seedance 2.0 text-to-video with high realism and broad prompting inputs, new image models Seedream 5.0 and Alibaba’s Qwen Image 2.0, plus xAI’s Grok Imagine API for text/image-to-video. * Open and competitive releases expand with Zhipu’s GLM-5, DeepSeek’s 1M-token context model, Cursor Composer 1.5, and open-weight Qwen3 Coder Next using hybrid attention aimed at efficient local/agentic coding. * Business updates include ElevenLabs raising $500M at an $11B valuation, Runway raising $315M at a $5.3B valuation, humanoid robotics firm Apptronik raising $935M at a $5.3B valuation, Waymo announcing readiness for high-volume production of its 6th-gen hardware, plus industry drama around Anthropic’s Super Bowl ad and departures from xAI. Timestamps: (00:00:10) Intro / Banter (00:02:03) Sponsor Break (00:05:33) Response to listener comments Tools & Apps (00:07:27) A Anthropic releases Opus 4.6 with new 'agent teams' | TechCrunch (00:11:28) OpenAI's new GPT-5.3-Codex is 25% faster and goes way beyond coding now - what's new | ZDNET (00:25:30) OpenAI launches new macOS app for agentic coding | TechCrunch (00:26:38) Google Unveils Gemini 3 Deep Think for Science & Engineering | The Tech Buzz (00:31:26) ByteDance's Seedance 2.0 Might be the Best AI Video Generator Yet - TechEBlog (00:35:14) China’s ByteDance, Alibaba unveil AI image tools to rival Google’s popular Nano Banana | South China Morning Post (00:36:54) DeepSeek boosts AI model with 10-fold token addition as Zhipu AI unveils GLM-5 | South China Morning Post (00:43:11) C Cursor launches Composer 1.5 with upgrades for complex tasks (00:44:03) xAI launches Grok Imagine API for text and image to video Applications & Business (00:45:47) Nvidia-backed AI voice startups ElevenLabs hits $11 billion valuation (00:52:04) AI video startup Runway raises $315M at $5.3B valuation, eyes more capable world models | TechCrunch (00:54:02) Humanoid robot startup Apptronik has now raised $935M at a $5B+ valuation | TechCrunch (00:57:10) Anthropic says ‘Claude will remain ad-free,’ unlike an unnamed rival | The Verge (01:00:18) Okay, now exactly half of xAI's founding team has left the company | TechCrunch (01:04:03) Waymo’s next-gen robotaxi is ready for passengers — and also ‘high-volume production’ | The Verge Projects & Open Source (01:04:59) Qwen3-Coder-Next: Pushing Small Hybrid Models on Agentic Coding (01:08:38) OpenClaw’s AI ‘skill’ extensions are a security nightmare | The Verge Research & Advancements (01:10:40) Learning to Reason in 13 Parameters (01:16:01) Reinforcement World Model Learning for LLM-based Agents (01:20:00) Opus 4.6 on Vending-Bench – Not Just a Helpful Assistant Policy & Safety (01:22:28) METR GPT-5.2 (01:26:59) The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence and Task Complexity? See Privacy Policy at and California Privacy Notice at .

2026-02-06 04:30:00 • 1:20:33

#233 - Moltbot, Genie 3, Qwen3-Max-Thinking

Our 233rd episode with a summary and discussion of last week's big AI news! Recorded on 01/30/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: Google introduces Gemini AI agent in Chrome for advanced browser functionality, including auto-browsing for pro and ultra subscribers. OpenAI releases ChatGPT Translator and Prism, expanding its applications beyond core business to language translation and scientific research assistance. Significant funding rounds and valuations achieved by startups Recursive and New Rofo, focusing on specialized AI chips and optical processors respectively. Political and social issues, including violence in Minnesota, prompt tech leaders in AI like Ade from Anthropic and Jeff Dean from Google to express concerns about the current administration's actions. Timestamps: (00:00:10) Intro / Banter Tools & Apps (00:04:09) Google adds Gemini AI-powered ‘auto browse’ to Chrome | The Verge (00:07:11) Users flock to open source Moltbot for always-on AI, despite major risks - Ars Technica (00:13:25) Google Brings Genie 3 'World Building' Experiment to AI Ultra Subscribers - CNET (00:16:17) OpenAI’s ChatGPT translator challenges Google Translate | The Verge (00:18:27) OpenAI launches Prism, a new AI workspace for scientists | TechCrunch Applications & Business (00:19:49) Exclusive: China gives nod to ByteDance, Alibaba and Tencent to buy Nvidia's H200 chips - sources | Reuters (00:22:55) AI chip startup Ricursive hits $4B valuation 2 months after launch (00:24:38) AI Startup Recursive in Funding Talks at $4 Billion Valuation - Bloomberg (00:27:30) Flapping Airplanes and the promise of research-driven AI | TechCrunch (00:31:54) From invisibility cloaks to AI chips: Neurophos raises $110M to build tiny optical processors for inferencing | TechCrunch Projects & Open Source (00:35:34) Qwen3-Max-Thinking debuts with focus on hard math, code (00:38:26) China's Moonshot releases a new open-source model Kimi K2.5 and a coding agent | TechCrunch (00:46:00) Ai2 launches family of open-source AI developer agents that adapt to any codebase - SiliconANGLE (00:47:46) Tiny startup Arcee AI built a 400B-parameter open source LLM from scratch to best Meta’s Llama Research & Advancements (00:52:53) Post-LayerNorm Is Back: Stable, ExpressivE, and Deep (00:58:00) [2601.19897] Self-Distillation Enables Continual Learning (01:03:04) [2601.20802] Reinforcement Learning via Self-Distillation (01:05:58) Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability Policy & Safety (01:09:13) Amodei, Hoffman Join Tech Workers Decrying Minnesota Violence - Bloomberg See Privacy Policy at and California Privacy Notice at .

2026-01-28 10:00:00 • 1:41:03

DELETED #232 - ChatGPT Ads, Thinking Machines Drama, STEM

Our 232st episode with a summary and discussion of last week's big AI news! Recorded on 01/23/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: OpenAI announces testing of ads in ChatGPT and introduces child age prediction to enhance safety features, amidst ongoing ethical debates and funding expansions in AI integration with educational tools and business models. China's AI landscape sees significant progress with AI firm Jpu training advanced models on domestic hardware, and strong competitive moves by data centers, highlighting the intense demand in AI manufacturing and infrastructure. Silicon Valley tensions rise as startup Thinking Machines experiences high-profile departures back to OpenAI, reflecting broader industry struggles and rapid shifts in organizational dynamics. AI legislation and safety measures advance with the US Senate's Defiance Act addressing explicit content, and Anthropic updating Claude's constitution to guide ethical AI interactions, while cultural pushbacks from artists signal ongoing debates in intellectual property and AI-generated content. Timestamps: (00:00:10) Intro / Banter (00:02:08) News Preview (00:02:26) Response to listener comments Tools & Apps (00:11:55) OpenAI to test ads in ChatGPT as it burns through billions - Ars Technica (00:18:05) OpenAI is launching age prediction for ChatGPT accounts (00:23:37) Google now offers free SAT practice exams, powered by Gemini | TechCrunch (00:24:57) Baidu’s AI Assistant Reaches Milestone of 200 Million Monthly Active Users - WSJ Applications & Business (00:26:53) The Drama at Thinking Machines, a New A.I. Start-Up, Is Riveting Silicon Valley - The New York Times (00:31:44) Zhipu AI breaks US chip reliance with first major model trained on Huawei stack | South China Morning Post (00:36:31) Elon Musk’s xAI launches world’s first Gigawatt AI supercluster to rival OpenAI and Anthropic (00:41:25) Sequoia to invest in Anthropic, breaking VC taboo on backing rivals: FT (00:45:18) Humans&, a 'human-centric' AI startup founded by Anthropic, xAI, Google alums, raised $480M seed round | TechCrunch Projects & Open Source (00:48:51) Black Forest Labs Releases FLUX.2 [klein]: Compact Flow Models for Interactive Visual Intelligence - MarkTechPost (00:50:35) [2601.10611] Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding (00:52:53) [2601.10547] HeartMuLa: A Family of Open Sourced Music Foundation Models (00:54:46) [2601.11044] AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts Research & Advancements (00:57:05) STEM: Scaling Transformers with Embedding Modules (01:06:22) Reasoning Models Generate Societies of Thought (01:14:21) Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts Policy & Safety (01:19:41) Senate passes bill letting victims sue over Grok AI explicit images (01:22:03) Building Production-Ready Probes For Gemini (01:27:32) Anthropic Publishes Claude AI's New Constitution | TIME Synthetic Media & Art (01:34:13) Artists Launch Stealing Isn't Innovation Campaign To Protest Big Tech See Privacy Policy at and California Privacy Notice at .

2026-01-21 03:00:00 • 1:43:17

DELETED #231 - Claude Cowork, Anthropic $10B, Deep Delta Learning

Our 231st episode with a summary and discussion of last week's big AI news! Recorded on 01/16/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: Anthropic's new cowork tool integrates Claude code, potentially simplifying multiple computing tasks from editing videos to compiling spreadsheets. Significant funding rounds see Anthropic raising $10B at a valuation of $350B, while XAI raises $20B, underscoring the immense market interest in AI startups. Nvidia faces supply challenges for H200 AI chips due to overwhelming demand from China, despite high costs per unit and its potential impact on U.S. company revenue. Policy debates highlight tensions around U.S. export controls to China, with leaders like Justin Lin from Alibaba and Jake Sullivan, former national security advisor, weighing in on the ramifications for the AI industry's future. Timestamps: (00:00:10) Intro / Banter (00:01:30) News Preview Tools & Apps (00:02:13) Anthropic’s new Cowork tool offers Claude Code without the code | TechCrunch (00:09:45) Google’s Gemini AI will use what it knows about you from Gmail, Search, and YouTube | The Verge (00:12:45) Google removes some AI health summaries after investigation finds “dangerous” flaws - Ars Technica (00:16:29) Gmail is getting a Gemini AI overhaul (00:18:12) Slackbot is an AI agent now | TechCrunch Applications & Business (00:20:11) Anthropic Raising $10 Billion at $350 Billion Value (00:22:25) Elon Musk xAI raises $20 billion from Nvidia, Cisco, investors (00:24:47) NVIDIA Needs a Supply Chain ‘Miracle’ From TSMC as China’s H200 AI Chip Orders Overwhelm Supply, Triggering a Bottleneck (00:29:26) OpenAI signs deal, worth $10B, for compute from Cerebras | TechCrunch (00:31:49) CoreWeave in focus as it amends credit agreement (00:34:30) LMArena lands $1.7B valuation four months after launching its product | TechCrunch Projects & Open Source (00:35:54) Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models (00:43:15) mHC: Manifold-Constrained Hyper-Connections (00:49:53) IQuest_Coder_Technical_Report (00:54:58) TII Abu-Dhabi Released Falcon H1R-7B: A New Reasoning Model Outperforming Others in Math and Coding with only 7B Params with 256k Context Window - MarkTechPost Research & Advancements (01:01:42) Deep Delta Learning (01:07:47) Recursive Language Models (01:13:39) Conditional memory via scalable lookup (01:18:54) Extending the Context of Pretrained LLMs by Dropping their Positional Embeddings Policy & Safety (01:26:06) Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks (01:31:00) Nvidia CEO says purchase orders, not formal declaration, will signal Chinese approval of H200 (01:32:24) China AI Leaders Warn of Widening Gap With US After $1B IPO Week (01:37:25) Jake Sullivan is furious that Trump removed Biden’s AI chip export controls | The Verge See Privacy Policy at and California Privacy Notice at .

2026-01-07 06:30:00 • 1:38:08

DELETED #230 - 2025 Retrospective, Nvidia buys Groq, GLM 4.7, METR

Our 230th episode with a summary and discussion of last week's big AI news! Recorded on 01/02/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: Nvidia's acquisition of AI chip startup Groq for $20 billion highlights a strategic move for enhanced inference technology in GPUs. New York's RAISE Act legislation aims to regulate AI safety, marking the second major AI safety bill in the US. The launch of GLM 4.7 by Zhipu AI marks a significant advancement in open-source AI models for coding. Evaluation of long-horizon AI agents raises concerns about the rising costs and efficiency of AI in performing extended tasks. Timestamps: (00:00:10) Intro / Banter (00:01:58) 2025 Retrospective Tools & Apps (00:24:39) OpenAI bets big on audio as Silicon Valley declares war on screens | TechCrunch Applications & Business (00:26:39) Nvidia buying AI chip startup Groq for about $20 billion, biggest deal (00:34:28) Exclusive | Meta Buys AI Startup Manus, Adding Millions of Paying Users - WSJ (00:38:05) Cursor continues acquisition spree with Graphite deal | TechCrunch (00:39:15) Micron Hikes CapEx to $20B with 2026 HBM Supply Fully Booked; HBM4 Ramps 2Q26 (00:42:06) Chinese fabs are reportedly upgrading older ASML DUV lithography chipmaking machines — secondary channels and independent engineers used to soup up Twinscan NXT series Projects & Open Source (00:47:52) Z.AI launches GLM-4.7, new SOTA open-source model for coding (00:50:11) Evaluating AI’s ability to perform scientific research tasks Research & Advancements (00:54:32) Large Causal Models from Large Language Models (00:57:33) Universally Converging Representations of Matter Across Scientific Foundation Models (01:02:11) META-RL INDUCES EXPLORATION IN LANGUAGE AGENTS (01:07:16) Are the Costs of AI Agents Also Rising Exponentially? (01:11:17) METR eval for Opus 4.5 (01:16:19) How to game the METR plot Policy & Safety (01:17:24) New York governor Kathy Hochul signs RAISE Act to regulate AI safety | TechCrunch (01:20:40) Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers (01:26:46) Monitoring Monitorability (01:32:07) Sam Altman is hiring someone to worry about the dangers of AI | The Verge (01:33:38) X users asking Grok to put this girl in bikini, Grok is happy obliging - India Today See Privacy Policy at and California Privacy Notice at .

2025-12-25 07:00:00 • 1:27:07

DELETED #229 - Gemini 3 Flash, ChatGPT Apps, Nemotron 3

Our 229th episode with a summary and discussion of last week's big AI news! Recorded on 12/19/2025 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: Notable releases include OpenAI's GPT-5.2 Codex for advanced coding and Google's Gemini Free Flash for competitive AI application performance. Nvidia's new open-source Trion-3 models also showcase impressive benchmarks. Funding updates highlight Lovable's $330M Series B, valuing the AI coding startup at $6.6B, and Faya's $140M Series D for AI model hosting, valued at $4.5B. China makes significant strides in semiconductor technology with advances in EUV lithography machines, led by Huawei and SMIC, potentially disrupting global chip manufacturing dominance. Key safety and policy updates include OpenAI's GPT-5.2 system card focusing on biosecurity and cybersecurity risks, while Google partners with the US military to power a new AI platform with Gemini models. Timestamps: (00:00:10) Intro / Banter (00:02:09) News Preview Tools & Apps (00:02:56) Google launches Gemini 3 Flash, makes it the default model in the Gemini app | TechCrunch (00:10:13) ChatGPT launches an app store, lets developers know it's open for business | TechCrunch (00:13:35) Introducing GPT-5.2-Codex | OpenAI (00:19:23) Story about OpenAI release - GPT image 1.5 (00:22:27) Meta partners with ElevenLabs to power AI audio across Instagram, Horizon - The Economic Times Applications & Business (00:23:16) OpenAI to End Equity Vesting Period for Employees, WSJ Says (00:28:20) How China built its ‘Manhattan Project’ to rival the West in AI chips (00:36:47) China’s Huawei, SMIC Make Progress With Chips, Report Finds (00:41:03) OpenAI in Talks to Raise At Least $10 Billion From Amazon and Use Its AI Chips (00:43:32) Amazon has a new leader for its ‘AGI’ group as it plays catch-up on AI | The Verge (00:47:27) Broadcom reveals its mystery $10 billion customer is Anthropic (00:49:12) Vibe-coding startup Lovable raises $330M at a $6.6B valuation | TechCrunch (00:50:38) Fal nabs $140M in fresh funding led by Sequoia, tripling valuation to $4.5B | TechCrunch Projects & Open Source (00:51:10) Nvidia Becomes a Major Model Maker With Nemotron 3 | WIRED (00:59:24) Meta introduces new SAM AI able to isolate and edit audio • The Register (00:59:54) [2512.14856] T5Gemma 2: Seeing, Reading, and Understanding Longer (01:03:10) Anthropic makes agent Skills an open standard - SiliconANGLE Research & Advancements (01:03:47) Budget-Aware Tool-Use Enables Effective Agent Scaling (01:08:21) Rethinking Thinking Tokens: LLMs as Improvement Operators (01:10:50) What if AI capabilities suddenly accelerated in 2027? How would the world know? Policy & Safety (01:12:58) Update to GPdfT-5 System Card: GPT-5.2 (01:18:04) Neural Chameleons: Language Models Can Learn to Hide Their Thoughts from Unseen Activation Monitors (01:20:47) Async Control: Stress-testing Asynchronous Control Measures for LLM Agents (01:24:37) Google is powering a new US military AI platform | The Verge See Privacy Policy at and California Privacy Notice at .

2025-12-17 08:00:00 • 1:26:42

DELETED #228 - GPT 5.2, Scaling Agents, Weird Generalization

Our 228th episode with a summary and discussion of last week's big AI news! Recorded on 12/12/2025 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: OpenAI's latest model GPT-5.2 demonstrates improved performance and enhanced multi-modal capabilities but comes with increased costs and a different knowledge cutoff date. Disney invests $1 billion in OpenAI to generate Disney character content, creating unique licensing agreements across characters from Marvel, Pixar, and Star Wars franchises. The U.S. government imposes new AI chip export rules involving security reviews, while simultaneously moving to prevent states from independently regulating AI. DeepMind releases a paper outlining the challenges and findings in scaling multi-agent systems, highlighting the complexities of tool coordination and task performance. Timestamps: (00:00:00) Intro / Banter (00:01:19) News Preview Tools & Apps (00:01:58) GPT-5.2 is OpenAI’s latest move in the agentic AI battle | The Verge (00:08:48) Runway releases its first world model, adds native audio to latest video model | TechCrunch (00:11:51) Google says it will link to more sources in AI Mode | The Verge (00:12:24) ChatGPT can now use Adobe apps to edit your photos and PDFs for free | The Verge (00:13:05) Tencent releases Hunyuan 2.0 with 406B parameters Applications & Business (00:16:15) China set to limit access to Nvidia’s H200 chips despite Trump export approval (00:21:02) Disney investing $1 billion in OpenAI, will allow characters on Sora (00:24:48) Unconventional AI confirms its massive $475M seed round (00:29:06) Slack CEO Denise Dresser to join OpenAI as chief revenue officer | TechCrunch (00:31:18) The state of enterprise AI Projects & Open Source (00:33:49) [2512.10791] The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality (00:36:27) Claude 4.5 Opus' Soul Document Research & Advancements (00:43:49) [2512.08296] Towards a Science of Scaling Agent Systems (00:48:43) Evaluating Gemini Robotics Policies in a Veo World Simulator (00:52:10) Guided Self-Evolving LLMs with Minimal Human Supervision (00:56:08) Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning (01:00:39) [2512.07783] On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models (01:04:42) Stabilizing Reinforcement Learning with LLMs: Formulation and Practices (01:09:42) Google’s AI unit DeepMind announces UK 'automated research lab' Policy & Safety (01:10:28) Trump Moves to Stop States From Regulating AI With a New Executive Order - The New York Times (01:13:54) [2512.09742] Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs (01:17:57) Forecasting AI Time Horizon Under Compute Slowdowns (01:20:46) AI Security Institute focuses on AI measurements and evaluations (01:21:16) Nvidia AI Chips to Undergo Unusual U.S. Security Review Before Export to China (01:22:01) U.S. Authorities Shut Down Major China-Linked AI Tech Smuggling Network Synthetic Media & Art (01:24:01) RSL 1.0 has arrived, allowing publishers to ask AI companies pay to scrape content | The Verge See Privacy Policy at and California Privacy Notice at .

2025-12-09 08:00:00 • 1:34:40

DELETED #227 - Jeremie is back! DeepSeek 3.2, TPUs, Nested Learning

Our 227th episode with a summary and discussion of last week's big AI news! Recorded on 12/05/2025 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: Deep Seek 3.2 and Flux 2 release, showcasing advancements in open-source AI models for natural language processing and image generation respectively. Amazon's new AI chips and Google's TPUs signal potential shifts in AI hardware dominance, with growing competition against Nvidia. Anthropic's potential IPO and OpenAI's declared ‘Code Red’ indicate significant moves in the AI business landscape, including high venture funding rounds for startups. Key research papers from DeepMind and Google explore advanced memory architectures and multi-agent systems, indicating ongoing efforts to enhance AI reasoning and efficiency. Timestamps: (00:00:10) Intro / Banter (00:02:42) News Preview Tools & Apps (00:03:30) Deepseek 3.2 : New AI Model is Faster, Cheaper and Smarter (00:23:22) Black Forest Labs launches Flux.2 AI image models to challenge Nano Banana Pro and Midjourney (00:28:00) Sora and Nano Banana Pro throttled amid soaring demand | The Verge (00:29:34) Mistral closes in on Big AI rivals with new open-weight frontier and small models | TechCrunch (00:31:41) Kling's Video O1 launches as the first all-in-one video model for generation and editing (00:34:07) Runway rolls out Gen 4.5 AI video model that beats Google, OpenAI Applications & Business (00:35:18) NVIDIA’s Partners Are Beginning to Tilt Toward Google’s TPU Ecosystem, with Foxconn Reportedly Securing TPU Rack Orders (00:40:37) Amazon releases an impressive new AI chip and teases an Nvidia-friendly roadmap | TechCrunch (00:43:03) OpenAI declares ‘code red’ as Google catches up in AI race | The Verge (00:46:20) Anthropic reportedly preparing for massive IPO in race with OpenAI: FT (00:48:41) Black Forest Labs raises $300M at $3.25B valuation | TechCrunch (00:49:20) Paris-based AI voice startup Gradium nabs $70M seed | TechCrunch (00:50:10) OpenAI announced a 1 GW Stargate cluster in Abu Dhabi (00:53:22) OpenAI’s investment into Thrive Holdings is its latest circular deal (00:55:11) OpenAI to acquire Neptune, an AI model training assistance startup (00:56:11) Anthropic acquires developer tool startup Bun to scale AI coding (00:56:55) Microsoft drops AI sales targets in half after salespeople miss their quotas - Ars Technica Projects & Open Source (00:57:51) [2511.22570] DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning (01:01:52) Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory Research & Advancements (01:05:44) Nested Learning: The Illusion of Deep Learning Architecture (01:13:30) Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO (01:15:50) State of AI: An Empirical 100 Trillion Token Study with OpenRouter Policy & Safety (01:21:52) Trump signs executive order launching Genesis Mission AI project (01:24:42) OpenAI has trained its LLM to confess to bad behavior | MIT Technology Review (01:29:34) US senators seek to block Nvidia sales of advanced chips to China See Privacy Policy at and California Privacy Notice at .

2025-11-30 07:30:00 • 1:11:11

DELETED #226 - Gemini 3, Claude Opus 4.5, Nano Banana Pro, LeJEPA

Our 226th episode with a summary and discussion of last week's big AI news! Recorded on 11/24/2025 Hosted by Andrey Kurenkov and co-hosted by Michelle Lee Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: New AI model releases include Google's Gemini 3 Pro, Anthropic's Opus 4.5, and OpenAI's GPT-5.1, each showcasing significant advancements in AI capabilities and applications. Robotics innovations feature Sunday Robotics' new robot Memo and a $600M funding round for Visual Intelligence, highlighting growth and investment in the robotics sector. AI safety and policy updates include Europe's proposed changes to GDPR and AI Act regulations, and reports of AI-assisted cyber espionage by a Chinese state-sponsored group. AI-generated content and legal highlights involve settlements between Warner Music Group and AI music platform UDIO, reflecting evolving dynamics in the field of synthetic media. Timestamps: (00:00:10) Intro / Banter (00:01:32) News Preview (00:02:10) Response to listener comments Tools & Apps (00:02:34) Google launches Gemini 3 with new coding app and record benchmark scores | TechCrunch (00:05:49) Google launches Nano Banana Pro powered by Gemini 3 (00:10:55) Anthropic releases Opus 4.5 with new Chrome and Excel integrations | TechCrunch (00:15:34) OpenAI releases GPT-5.1-Codex-Max to handle engineering tasks that span twenty-four hours (00:18:26) ChatGPT launches group chats globally | TechCrunch (00:20:33) Grok Claims Elon Musk Is More Athletic Than LeBron James — and the World’s Greatest Lover Applications & Business (00:24:03) What AI bubble? Nvidia's strong earnings signal there's more room to grow (00:26:26) Alphabet stock surges on Gemini 3 AI model optimism (00:28:09) Sunday Robotics emerges from stealth with launch of ‘Memo’ humanoid house chores robot (00:32:30) Robotics Startup Physical Intelligence Valued at $5.6 Billion in New Funding - Bloomberg (00:34:22) Waymo permitted areas expanded by California DMV - CBS Los Angeles - Waymo enters 3 more cities: Minneapolis, New Orleans, and Tampa | TechCrunch Projects & Open Source (00:37:00) Meta AI Releases Segment Anything Model 3 (SAM 3) for Promptable Concept Segmentation in Images and Videos - MarkTechPost (00:40:18) [2511.16624] SAM 3D: 3Dfy Anything in Images (00:42:51) [2511.13998] LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering Research & Advancements (00:45:10) [2511.08544] LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics (00:50:08) [2511.13720] Back to Basics: Let Denoising Generative Models Denoise Policy & Safety (00:52:08) Europe is scaling back its landmark privacy and AI laws | The Verge (00:54:13) From shortcuts to sabotage: natural emergent misalignment from reward hacking (00:58:24) [2511.15304] Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models (01:01:43) Disrupting the first reported AI-orchestrated cyber espionage campaign (01:04:36) OpenAI Locks Down San Francisco Offices Following Alleged Threat From Activist | WIRED Synthetic Media & Art (01:07:02) Warner Music Group Settles AI Lawsuit With Udio See Privacy Policy at and California Privacy Notice at .

2025-11-21 06:00:00 • 1:18:14

DELETED #225 - GPT 5.1, Kimi K2 Thinking, Remote Labor Index

Our 225th episode with a summary and discussion of last week's big AI news! Recorded on 11/16/2025 Hosted by Andrey Kurenkov and co-hosted by Michelle Lee Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: New AI model releases include GPT-5.1 from OpenAI and Ernie 5.0 from Baidu, each with updated features and capabilities. Self-driving technology advancements from Baidu’s Apollo Go and Pony AI’s IPO highlight significant progress in the automotive sector. Startup funding updates include Incept taking $50M for diffusion models, while Cursor and Gamma secure significant valuations for coding and presentation tools respectively. AI-generated content is gaining traction with songs topping charts and new marketplaces for AI-generated voices, indicating evolving trends in synthetic media. Timestamps: (00:01:19) News Preview Tools & Apps (00:02:13) OpenAI says the brand-new GPT-5.1 is ‘warmer’ and has more ‘personality’ options | The Verge (00:04:51) Baidu Unveils ERNIE 5.0 and a Series of AI Applications at Baidu World 2025, Ramps Up Global Push (00:07:00) ByteDance’s Volcano Engine debuts coding agent at $1.3 promo price (00:08:04) Google will let users call stores, browse products, and check out using AI | The Verge (00:10:41) Fei-Fei Li's World Labs speeds up the world model race with Marble, its first commercial product | TechCrunch (00:13:30) OpenAI says it's fixed ChatGPT's em dash problem | TechCrunch Applications & Business (00:16:01) Anthropic announces $50 billion data center plan | TechCrunch (00:18:06) Baidu teases next-gen AI training, inference accelerators • The Register (00:20:50) Meta chief AI scientist Yann LeCun plans to exit and launch own start-up (00:24:41) Amazon Demands Perplexity Stop AI Tool From Making Purchases - Bloomberg (00:27:32) AI PowerPoint-killer Gamma hits $2.1B valuation, $100M ARR, founder says | TechCrunch (00:29:33) Inception raises $50 million to build diffusion models for code and text | TechCrunch (00:31:14) Coding assistant Cursor raises $2.3B 5 months after its previous round | TechCrunch (00:33:56) China's Baidu says it's running 250,000 robotaxi rides a week — same as Alphabet's Waymo (00:35:26) Driverless Tech Firm Pony AI Raises $863 Million in HK Listing Projects & Open Source (00:36:30) Moonshot's Kimi K2 Thinking emerges as leading open source AI Research & Advancements (00:39:22) [2510.26787] Remote Labor Index: Measuring AI Automation of Remote Work (00:45:21) OpenAI Researchers Train Weight Sparse Transformers to Expose Interpretable Circuits - MarkTechPost (00:49:34) Kimi Linear: An Expressive, Efficient Attention Architecture (00:53:33) Watch Google DeepMind’s new AI agent learn to play video games | The Verge (00:57:34) arXiv Changes Rules After Getting Spammed With AI-Generated 'Research' Papers Policy & Safety (00:59:35) Stability AI largely wins UK court battle against Getty Images over copyright and trademark | AP News (01:01:48) Court rules that OpenAI violated German copyright law; orders it to pay damages | TechCrunch (01:03:48) Microsoft's $15.2B UAE investment turns Gulf State into test case for US AI diplomacy | TechCrunch Synthetic Media & Art (01:06:39) An AI-Generated Country Song Is Topping A Billboard Chart, And That Should Infuriate Us All | Whiskey Riff (01:10:59) Xania Monet is the first AI-powered artist to debut on a Billboard airplay chart, but she likely won’t be the last | CNN (01:13:34) ElevenLabs’ new AI marketplace lets brands use famous voices for ads | The Verge See Privacy Policy at and California Privacy Notice at .

2025-11-05 04:00:00 • 1:31:43

DELETED #224 - OpenAI is for-profit! Cursor 2, Minimax M2, Udio copyright

Our 224th episode with a summary and discussion of last week's big AI news! Recorded on 10/31/2025 Hosted by Andrey Kurenkov and co-hosted by Gavin Purcell (check out AI For Humans and AndThen !) Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: OpenAI completes its for-profit restructuring, redefining its relationship with Microsoft and securing future investments. Meanwhile, Qualcomm and other tech giants announce new AI chips aimed at competing with Nvidia and AMD, marking major advancements in AI hardware capabilities. Amazon and Google deepen their partnerships with Anthropic, providing extensive computing resources to enhance AI research and applications. These developments signal significant growth and competition in the AI industry. Major AI tools and models were released and updated, including Cursor 2.0, CLAUDE coding capabilities, and open-source options from Minimax. These new tools offer a range of functionalities for coding, design, and more. Legal battles around AI copyright issues persist, as OpenAI faces ongoing lawsuits from authors over text generation using copyrighted material. Universal Music Group settles a copyright suit with AI music startup UDO, transitioning to a licensed model for AI-generated music. This shift reflects broader challenges and adaptations in the AI-generated content space, where copyright and ethical usage remain highly contentious issues. Timestamps: (00:00:10) Intro / Banter (00:02:44) News Preview Tools & Apps (00:03:44) Cursor 2.0 shifts to in-house AI with Composer model and parallel agents (00:07:44) Anthropic brings Claude Code to the web | TechCrunch (00:10:01) Microsoft's Mico is a 'Clippy' for the AI era | TechCrunch (00:14:20) Anthropic’s Claude catches up to ChatGPT and Gemini with upgraded memory features | The Verge (00:18:46) Canva launches its own design model, adds new AI features to the platform | TechCrunch (00:21:07) Elon Musk’s Grokipedia launches with AI-cloned pages from Wikipedia | The Verge Applications & Business (00:25:10) OpenAI completed its for-profit restructuring — and struck a new deal with Microsoft | The Verge (00:31:25) Qualcomm announces AI chips to compete with AMD and Nvidia (00:34:02) Amazon launches AI infrastructure project, to power Anthropic's Claude model | Reuters (00:38:52) Google and Anthropic announce cloud deal worth tens of billions (00:39:46) Google partners with Ambani's Reliance to offer free AI Pro access to millions of Jio users in India | TechCrunch Projects & Open Source (00:41:17) MiniMax Releases MiniMax M2: A Mini Open Model Built for Max Coding and Agentic Workflows at 8% Claude Sonnet Price and ~2x Faster - MarkTechPost (00:45:22) [2510.25741] Scaling Latent Reasoning via Looped Language Models (00:47:59) OpenAI’s gpt-oss-safeguard enables developers to build safer AI - Help Net Security Research & Advancements (00:49:51) [2510.15103] Continual Learning via Sparse Memory Finetuning (00:54:01) [2510.18091] Accelerating Vision Transformers with Adaptive Patch Sizes (00:57:46) [2510.18871] How Do LLMs Use Their Depth? Policy & Safety (01:01:07) AMD, Department of Energy announce $1 billion AI supercomputer partnership | The Verge (01:03:03) Synthetic Media & Art (01:09:34) Universal partners with AI startup Udio after settling copyright suit | The Verge (01:16:04) OpenAI loses bid to dismiss part of US authors' copyright lawsuit | Reuters See Privacy Policy at and California Privacy Notice at .

2025-10-24 15:00:00 • 1:11:45

DELETED #223 - Haiku 4.5, OpenAI DevDay, Claude Skills, Scaling RL, SB 243

Our 223st episode with a summary and discussion of last week's big AI news! Recorded on 10/17/2025 Hosted by Andrey Kurenkov and co-hosted by Erik Schnultz Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: Anthropic and OpenAI have announced updates to their AI models and tools, including Haiku 4.5 and various business collaborations. Multiple companies like Slack and Salesforce are integrating AI assistants and agents into their platforms, enhancing task management and business operations. Recent research in reinforcement learning and agent memory curation highlights new methods for improving AI model performance and context management. California has passed a law to regulate AI chatbots for children and vulnerable users, and there are rising concerns over the increasing amount of AI-generated content on the internet. Timestamps: (00:00:10) Intro / Banter (00:01:31) News Preview Tools & Apps (00:02:18) Anthropic launches new version of scaled-down ‘Haiku’ model (00:04:52) Everything OpenAI announced at DevDay 2025: Agent Kit, Apps SDK, ChatGPT, and more | ZDNET (00:09:11) Anthropic turns to ‘skills’ to make Claude more useful at work | The Verge (00:13:20) Microsoft launches ‘vibe working’ in Excel and Word | The Verge (00:17:22) Google releases Veo 3.1, adds it to Flow video editor | TechCrunch (00:19:40) Slack is turning Slackbot into an AI assistant | The Verge (00:22:52) Salesforce announces Agentforce 360 as enterprise AI competition heats up | TechCrunch Applications & Business (00:24:58) Broadcom stock pops 9% on OpenAI custom chip deal, adding to Nvidia and AMD agreements (00:27:58) How ByteDance Made China’s Most Popular AI Chatbot | WIRED (00:30:08) Amazon's Zoox Robotaxis Have Arrived In Las Vegas - Here's What Riders Are Experiencing (00:32:43) Waymo’s robotaxis are coming to London | The Verge (00:34:14) Reflection AI raises $2B to be America's open frontier AI lab, challenging DeepSeek | TechCrunch (00:35:58) General Intuition lands $134M seed to teach agents spatial reasoning using video game clips | TechCrunch (00:38:36) Supabase nabs $5B valuation, four months after hitting $2B | TechCrunch Projects & Open Source (00:40:58) Neuphonic Open-Sources NeuTTS Air: A 748M-Parameter On-Device Speech Language Model with Instant Voice Cloning - MarkTechPost (00:43:06) Anthropic AI Releases Petri: An Open-Source Framework for Automated Auditing by Using AI Agents to Test the Behaviors of Target Models on Diverse Scenarios - MarkTechPost Research & Advancements (00:44:25) [2510.13786] The Art of Scaling Reinforcement Learning Compute for LLMs (00:48:51) [2510.01171] Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity (00:51:22) [2510.12635] Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks (00:54:31) [2510.07364] Base Models Know How to Reason, Thinking Models Learn When (00:57:24) [2510.12402] Cautious Weight Decay Policy & Safety (01:02:03) California becomes first state to regulate AI companion chatbots | TechCrunch (01:04:13) Over 50 Percent of the Internet Is Now AI Slop, New Data Finds Synthetic Media & Art (01:06:31) OpenAI Reverses Stance on Use of Copyright Works in Sora - WSJ (01:08:29) Character.AI removes Disney characters from platform after studio issues warning See Privacy Policy at and California Privacy Notice at .

2025-10-07 17:00:00 • 1:37:16

DELETED #222 - Sora 2, Sonnet 4.5, Vibes, Thinking Machines

Our 222st episode with a summary and discussion of last week's big AI news! Recorded on 10/03/2025 Hosted by Andrey Kurenkov and co-hosted by Jon Krohn Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: OpenAI introduced several new features, including SOA 2 for text-to-video generation, Claude Sonnet 4.5 for coding and agentic tasks, and the pulse feature for personalized morning briefs. Meta launched a new AI video creation feature called Vibes in its Meta AI app and on meta.ai, facing mixed reactions from the public regarding AI-generated content. California's SB 53, the Transparency in Frontier AI Act, has become law, requiring large AI companies to disclose safety and security processes, while SB 942 mandates AI detection tools for user-generated content. AI regulations and industry dynamics, including battles over intellectual property, startup funding, and the integration of AI into everyday tools and services like Microsoft's AI agents for Word, Excel, and PowerPoint. In this episode: (00:00:10) Intro / Banter (00:03:08) News Preview (00:03:56) Response to listener comments Tools & Apps (00:04:51) ChatGPT parent company OpenAI announces Sora 2 with AI video app (00:11:35) Anthropic releases Claude Sonnet 4.5 in latest bid for AI agents and coding supremacy | The Verge (00:22:25) Meta launches 'Vibes,' a short-form video feed of AI slop | TechCrunch (00:26:42) OpenAI launches ChatGPT Pulse to proactively write you morning briefs | TechCrunch (00:33:44) OpenAI rolls out safety routing system, parental controls on ChatGPT | TechCrunch (00:35:53) The Latest Gemini 2.5 Flash-Lite Preview is Now the Fastest Proprietary Model (External Tests) and 50% Fewer Output Tokens - MarkTechPost (00:39:54) Microsoft just added AI agents to Word, Excel, and PowerPoint - how to use them | ZDNET Applications & Business (00:42:41) OpenAI takes on Google, Amazon with new agentic shopping system | TechCrunch (00:46:01) Exclusive: Mira Murati’s Stealth AI Lab Launches Its First Product | WIRED (00:49:54) OpenAI is the world's most valuable private company after private stock sale | TechCrunch (00:53:07) Elon Musk’s xAI accuses OpenAI of stealing trade secrets in new lawsuit | Technology | The Guardian (00:55:40) Former OpenAI and DeepMind researchers raise whopping $300M seed to automate science | TechCrunch Projects & Open Source (00:58:26) [2509.16941] SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks? Research & Advancements (01:01:28) [2509.17196] Evolution of Concepts in Language Model Pre-Training (01:05:36) [2509.19284] What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT Lighting round (01:09:37) [2507.02954] Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III (01:12:03) [2509.24552] Short window attention enables long-term memorization Policy & Safety (01:18:11) SB 53, the landmark AI transparency bill, is now law in California | The Verge (01:24:07) Elon Musk's xAI offers Grok to federal government for 42 cents | TechCrunch (01:25:23) Character.AI removes Disney characters from platform after studio issues warning (01:28:50) Spotify's Attempt to Fight AI Slop Falls on Its Face See Privacy Policy at and California Privacy Notice at .

2025-10-07 16:30:00 • 47:01

DELETED #221 - OpenAI Codex, Gemini in Chrome, K2-Think, SB 53

Our 221st episode with a summary and discussion of last week's big AI news! Recorded on 09/19/2025 Note: we transitioned to a new RSS feed and it seems this did not make it to there, so this may be posted about 2 weeks past the release date. Hosted by Andrey Kurenkov and co-hosted by Michelle Lee Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at In this episode: OpenAI releases a new version of Codex integrated with GPT-5, enhancing coding capabilities and aiming to compete with other AI coding tools like Cloud Code. Significant updates in the robotics sector include new ventures in humanoid robots from companies like Figure AI and China’s Unitree, as well as expansions in robotaxi services from Tesla and Amazon’s Zoox. New open-source models and research advancements were discussed, including Google's DeepMind's self-improving foundation model for robotics and a physics foundation model aimed at generalizing across various physical systems. Legal battles continue to surface in the AI landscape with Warner Bros. suing MidJourney for copyright violations and Rolling Stone suing Google over AI-generated content summaries, highlighting challenges in AI governance and ethics. Timestamps: (00:00:10) Intro / Banter Tools & Apps (00:02:33) OpenAI upgrades Codex with a new version of GPT-5 (00:04:02) Google Injects Gemini Into Chrome as AI Browsers Go Mainstream | WIRED (00:06:14) Anthropic’s Claude can now make you a spreadsheet or slide deck. | The Verge (00:07:12) Luma AI's New Ray3 Video Generator Can 'Think' Before Creating - CNET Applications & Business (00:08:32) OpenAI secures Microsoft's blessing to transition its for-profit arm | TechCrunch (00:10:31) Microsoft to lessen reliance on OpenAI by buying AI from rival Anthropic | TechCrunch (00:12:00) Figure AI passes $1B with Series C funding toward humanoid robot development - The Robot Report (00:13:52) China’s Unitree plans $7 billion IPO valuation as humanoid robot race heats up (00:15:45) Tesla's robotaxi plans for Nevada move forward with testing permit | TechCrunch (00:17:48) Amazon's Zoox jumps into U.S. robotaxi race with Las Vegas launch (00:19:27) Replit hits $3B valuation on $150M annualized revenue | TechCrunch (00:21:14) Perplexity reportedly raised $200M at $20B valuation | TechCrunch Projects & Open Source (00:22:08) [2509.07604] K2-Think: A Parameter-Efficient Reasoning System (00:24:31) [2509.09614] LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering Research & Advancements (00:28:17) [2509.15155] Self-Improving Embodied Foundation Models (00:31:47) [2509.13805] Towards a Physics Foundation Model (00:34:26) [2509.12129] Embodied Navigation Foundation Model Policy & Safety (00:37:49) Anthropic endorses California's AI safety bill, SB 53 | TechCrunch (00:40:12) Warner Bros. Sues Midjourney, Joins Studios' AI Copyright Battle (00:42:02) Rolling Stone Publisher Sues Google Over AI Overview Summaries See Privacy Policy at and California Privacy Notice at .

Last Week in AI

Processing Settings

Content Removal

Retention Policy

Features

DOWNLOADED #240 - Project Glasswing, Claude Mythos, GLM-5.1, emotion concepts 7 plays

DOWNLOADED #239 - RIP Sora, Claude Openclaw, HyperAgents

DOWNLOADED #238 - GPT 5.4 mini, OpenAI Pivot, Mamba 3, Attention Residuals 37 plays

DELETED #237 - Nemotron 3 Super, xAI reborn, Anthropic Lawsuit, Research!!! 3 plays

DELETED #236 - GPT 5.4, Gemini 3.1 Flash Lite, Supply Chain Risk

DELETED #235 - Sonnet 4.6, Deep-thinking tokens, Anthropic vs Pentagon 3 plays

DELETED #234 - Opus 4.6, GPT-5.3-codex, Seedance 2.0, GLM-5 8 plays

#233 - Moltbot, Genie 3, Qwen3-Max-Thinking

DELETED #232 - ChatGPT Ads, Thinking Machines Drama, STEM

DELETED #231 - Claude Cowork, Anthropic $10B, Deep Delta Learning

DELETED #230 - 2025 Retrospective, Nvidia buys Groq, GLM 4.7, METR

DELETED #229 - Gemini 3 Flash, ChatGPT Apps, Nemotron 3

DELETED #228 - GPT 5.2, Scaling Agents, Weird Generalization

DELETED #227 - Jeremie is back! DeepSeek 3.2, TPUs, Nested Learning

DELETED #226 - Gemini 3, Claude Opus 4.5, Nano Banana Pro, LeJEPA

DELETED #225 - GPT 5.1, Kimi K2 Thinking, Remote Labor Index

DELETED #224 - OpenAI is for-profit! Cursor 2, Minimax M2, Udio copyright

DELETED #223 - Haiku 4.5, OpenAI DevDay, Claude Skills, Scaling RL, SB 243

DELETED #222 - Sora 2, Sonnet 4.5, Vibes, Thinking Machines

DELETED #221 - OpenAI Codex, Gemini in Chrome, K2-Think, SB 53