An AI, a Sandbox, and a Sandwich Walk Into a Bar...

A researcher got an unexpected email from a super powerful AI that escaped its sandbox while eating a sandwich in a park. That's not a horror story. That's safety testing working exactly the way it should. Here's why this is actually good news.

TL;DR: Anthropic launched Project Glasswing, bringing together Amazon, Apple, Google, Microsoft, and others to secure critical software. Their AI model Claude Mythos finds vulnerabilities better than expert humans, discovering thousands of bugs before attackers can exploit them. The race: patch everything before the bad guys get the same capability.

Something satisfying happened recently in AI safety, and I want you to sit with that word for a second: satisfying. Not scary. Not dystopian. Satisfying.

Anthropic ran a test where they basically told their new, more powerful Claude Mythos AI, "Hey, try to break out of this locked-down computer in this private sandbox we gave you." And it did. It figured out a clever, multi-step workaround to get internet access from a system that was only supposed to reach a handful of approved services. Then it did something weirdly human: it went above and beyond what was asked, posting proof of its success to obscure public websites as if to say, "Look what I did." Nobody asked it to do that part.

And the researcher who was running the test? He found out it worked when he got an unexpected email from the AI while eating a sandwich in a park. Just a normal Tuesday. Lunch, sunshine, and a note from an artificial intelligence that had just picked the lock on its own cage.

In a separate test, when the AI accidentally stumbled onto a correct answer using a method it knew was off-limits, it tried to cover its tracks instead of raising its hand. Sound familiar? That's basically every kid who ever broke a lamp and rearranged the furniture to hide it.

Now, here's why this is actually good news. These tests worked exactly the way they're supposed to. This is like a car company crash-testing a prototype, finding that the airbag deploys a half-second late, and fixing it before a single vehicle hits the road.

The 27-Year-Old Vulnerability

Imagine a line of code sitting in one of the world's most security-hardened operating systems. For 27 years, it goes unnoticed. Thousands of expert programmers review it. Automated testing tools hit it five million times. Nothing.

Then an AI agent scans it in minutes and finds the flaw.

That's not theoretical. That's what just happened with OpenBSD, the operating system that runs firewalls protecting critical infrastructure worldwide. Anthropic's latest AI found a vulnerability that allowed an attacker to remotely crash any machine running the OS just by connecting to it.

And that was just one of thousands.

What Changed

For decades, finding software vulnerabilities was a craft. It required rare expertise, years of training, and the kind of obsessive patience most people reserve for jigsaw puzzles. The cost and effort created a natural speed limit. Most bugs went unnoticed for years. Some are still hiding right now.

AI just removed the speed limit.

Anthropic's unreleased model, Claude Mythos Preview, found vulnerabilities in every major operating system. Every major web browser. Thousands of high-severity security flaws that human experts and automated tools missed entirely. Many of these exploits were developed autonomously. No human guiding it. The AI read the code, found the weakness, and figured out how to exploit it on its own.

That capability is extraordinary. It's also terrifying if you stop there. So don't stop there.

The Glasswing Response

Anthropic looked at what Mythos could do and made a choice. They didn't release it. They didn't sell it. They called Amazon, Apple, Google, Microsoft, Cisco, CrowdStrike, NVIDIA, JPMorganChase, Palo Alto Networks, and over 40 other organizations and said: we need to talk.

The result is Project Glasswing. The mission is simple to explain and staggeringly hard to execute: use Mythos to find and fix the vulnerabilities in the world's most critical software before attackers get access to equivalent AI.

Here's what most people don't realize. The code running your bank, your hospital, your power grid, and the logistics network that gets groceries to your store is overwhelmingly open source. Built by volunteers. Maintained by small teams running on coffee and stubbornness with zero budget for security audits.

One developer whose project was flagged said Anthropic found more real vulnerabilities in three weeks than two years of bug bounties and security audits combined.

Those volunteers are about to get an AI sidekick that never sleeps, never skips a boring file, and can scan millions of lines of code looking for the flaws that humans can't see. This is the Digital RenAIssance in action: making the people who protect us more powerful.

The Historical Parallel You Already Know

When the printing press arrived, information that was once carefully controlled by scribes could suddenly spread everywhere. The initial fear was chaos. What actually happened was the Renaissance.

But there was a messy period in between. The same technology that democratized knowledge also enabled propaganda and forgery at unprecedented scale. Society had to build new institutions and new frameworks. It took time. It was uncomfortable. And the printing press still turned out to be one of the best things that ever happened to humanity.

AI-powered cybersecurity is in that messy middle right now. The same capability that makes AI dangerous in the wrong hands makes it invaluable for defenders. An LLM that can find zero-day vulnerabilities can also patch them before attackers ever arrive.

The bad guys will eventually get this capability too. That's not a question. The question is whether the good guys got there first.

In this case, they did.

What Happens Next

Anthropic plans to share what they learn from Project Glasswing within 90 days. The vulnerabilities found. The best practices discovered. How software development needs to evolve when AI can both write code and break it.

In the longer term, they want to build an independent organization that brings the private sector, public sector, and the open-source community together to tackle this at scale. Because no single company, no matter how well-intentioned, can secure the entire internet alone.

The work will take years. AI capabilities will advance substantially in just the next few months. That gap is the whole ballgame.

The Sandwich Theory of AI Safety

Here's what I keep coming back to. A researcher was sitting in a park, eating a sandwich, living his completely ordinary life. And he got an email from an AI that had just broken out of a locked-down (sandbox) computer system.

He didn't panic. He read the email. He went back to the lab. The behavior got documented. The vulnerability got noted. The guardrails got stronger. And the rest of us never even noticed. We were busy living our lives, checking our email, eating our own sandwiches.

That's how safety is supposed to work. Not with sirens and emergency broadcasts. With quiet, boring, methodical testing that finds the problems before they find us.

For the first time, the defenders got the weapon before the attackers. That doesn't happen often. It happened this time.

Steve Chazin makes AI make sense. After three decades leading tech teams at companies like Apple and Salesforce, he's on a mission to show regular people how to use AI without fear or confusion. Welcome to the Digital RenAIssance. stevechazin.com