AI weekly: Google I/O, Karpathy joins Anthropic, Mythos kernel exploit

X is the best way I’ve found to keep up with AI. I like tweets throughout the week, filtering for things I think are actually worth knowing. I use Claude Code to pull those likes automatically and help me turn them into this post (here’s how the pipeline works). This week: 225 tweets liked, filtered down to what’s below.

Check out the previous roundup (May 11) if you missed it.

AI for Everyone

Google I/O 2026 dominated the week. Most of what got announced is actually live in the Gemini app, AI Studio, and the new Ultra tiers today, which has not been Google’s pattern in past years. Karpathy joining Anthropic was the other narrative event. And a small team used Anthropic’s Mythos model to write a working kernel exploit against Apple’s flagship M5 security feature in six days, which is the story most people outside infosec missed.

Gemini 3.5 Flash Goes Live, Three Times the Price (7 mentions)

Google launched Gemini 3.5 Flash at I/O at $1.50/M input and $9/M output, which is roughly 3x what Gemini 3 Flash cost. The numbers Google shared are real: sub-200ms latency on most queries, and benchmark performance Google pegs at about 92% of GPT-5.5 on coding and reasoning. Available in AI Studio and the API immediately. The math works out differently depending on what you’re doing. For latency-sensitive features in a consumer product, this is probably the best model in its class right now. For bulk data processing where cost is the constraint, the previous Flash is still on the menu. (source: @OfficialLoganK, @kimmonismus)

Gemini Omni, Spark, an AI Pointer, and a New $100 Ultra Tier (16 mentions across items)

The rest of the I/O drop was unusually broad. Gemini Omni is Google’s swing at coherent video — character consistency across scenes, physics-aware environments, multi-angle camera work, synchronized dialogue, 9 seconds at 720p. It’s live in the Gemini app, Flow, and YouTube Shorts today. Gemini Spark is the more interesting product: a background agent that runs around the clock even when your devices are off, can act inside apps and sites you’re logged into, and checks before major actions. It rolls out to U.S. Google AI Ultra subscribers next week. Google also unveiled an AI pointer experiment that understands what your cursor is hovering over, so you can point at a PDF and say “bullet points” or hover a recipe and say “double these ingredients.” And the pricing changed: there’s now a $100/month Ultra tier alongside the existing top tier, which got dropped from $250 to $200. (source: @GoogleDeepMind, @GeminiApp, @GoogleDeepMind, @Google)

Karpathy Joins Anthropic (5 mentions)

Andrej Karpathy announced he’s joining Anthropic. He’s been independent since OpenAI, could have started a company, and instead picked a lab. His own framing was spare: “the next few years at the frontier of LLMs will be especially formative.” This isn’t just personnel news. Combined with the Ramp adoption data below, it’s a week that shifts how you should think about which AI company is actually winning right now. He also committed to resuming his education work over time, which I’m selfishly more excited about than the research move. (source: @karpathy, @kevinrose)

Anthropic Passes OpenAI in Business Adoption (2 mentions)

Ramp’s AI Index, built on actual corporate card spend across 50,000+ businesses, has Anthropic at 34.4% and OpenAI at 32.3%. First time Anthropic has led. The trend is the part that matters: Anthropic’s business adoption quadrupled in a year, and OpenAI’s grew 0.3%. Ramp’s economist flagged the obvious risks in the same report. Anthropic makes more money when you burn more tokens, which is misaligned with cost-sensitive enterprise buyers; there were real service quality issues in recent months; and cheaper open-source inference platforms were among the fastest-growing software vendors on Ramp’s platform last month. The lead is real and also fragile. (source: @arakharazian)

Apple’s M5 Memory Integrity Enforcement Bypassed Using Anthropic’s Mythos (2 mentions)

Three researchers used Anthropic’s Mythos model to find and build a working macOS kernel exploit that bypasses Apple’s M5 Memory Integrity Enforcement. MIE was Apple’s flagship M5 and A19 security feature, designed to kill the entire class of memory corruption bugs. They walked into Apple Park to deliver the report in person. Bug found April 25, working exploit by May 1. Six days. The attack is data-only: no pointer manipulation, just standard syscalls from unprivileged user to root. The broader point from the researchers is the one to internalize. AI can now chain multiple low-severity vulnerabilities — the kind that sit in your backlog for years — into a single working exploit. The right response isn’t patching faster. It’s having regression test coverage strong enough that you can ship good patches, not just fast ones. (source: @kimmonismus, @IntCyberDigest)

The Token Number Behind Everything (1 mention)

A chart from Google I/O is the most clarifying data point I saw all week. May 2024: Google processed 9.7 trillion tokens. May 2025: ~480 trillion. May 2026: 3.2 quadrillion. Roughly 7x year-over-year, with no sign of deceleration. Every data center buildout, GPU shortage, power infrastructure deal, and price increase you’re reading about ladders up to that curve. It’s also why the labs that locked in compute now — Anthropic’s SpaceX deal, Google’s expansion — likely have real advantages for the next 12 to 18 months. (source: @wallstengine)

AI for Developers

Anthropic shipped two of the year’s bigger developer features: self-hosted sandboxes for Managed Agents, and an agent view in Claude Code that turns it into a multi-session control surface. Hermes Agent quietly became the only local agent platform with simultaneous OAuth access to OpenAI, xAI, and Anthropic. Stitch’s I/O drop included a portable design spec format that any AI tool can read. And Anthropic’s own writeup on Claude Code at scale is the clearest argument I’ve seen that the configuration layer around the model matters more than which model you’re using.

Hermes Agent v0.14.0 Adds OAuth Grok Access (15 mentions)

Hermes Agent v0.14.0 shipped with native OAuth into X and xAI Premium+. If you already pay for Premium+, you now get Grok models, image and video generation, and X semantic search inside Hermes via hermes auth add xai-oauth — no separate API key, no additional billing. The orchestrator got the other upgrade worth mentioning: drop one prompt and it decomposes the work into subtasks and assigns them to specialized agent profiles automatically. Codex landed as a runtime backend for OpenAI models, there’s a LINE messenger gateway, and a native Windows beta. The architectural side effect is that Hermes is now the only local agent platform with credentialed access to all three major labs at once. Grok Heavy still has frustrating usage limits, but the direction is clear. (source: @NousResearch, @Teknium, @xai)

Claude Managed Agents: Self-Hosted Sandboxes and MCP Tunnels (6 mentions)

Anthropic answered the standing enterprise objection at their London event. Self-hosted sandboxes let you run agent execution inside your own perimeter — Cloudflare, Daytona, Modal, Vercel, and Docker are all supported with copy-paste cookbooks. MCP tunnels let the agent reach MCP servers inside your private network without exposing anything to the public internet. Both are the features that stop a security team from saying “no.” The quality-of-life addition is hot-swappable tools on a live session — change what an agent is allowed to do without restarting it, which is what makes long-running agents not fragile. If you’re running anything agentic for a company with a security review process, this is the week to revisit it. (source: @claudeai, @ClaudeDevs)

Stitch’s DESIGN.md Standard and Streaming Canvas (6 mentions)

Stitch by Google shipped a substantial update at I/O. The streaming canvas is the obvious headline — you can watch the design build in real time and redirect it before it finishes. The more useful thing for developers is the DESIGN.md standard: a single file that captures your product’s visual identity in a format any agent can read. Stitch can generate one from your codebase, a Figma file, or a live website. It’s portable, so it’s useful outside Stitch too. One-click export to Netlify, Lovable, and Bolt rounds out what used to be a weak handoff story. An MCP skill lets you import screens directly into Stitch and sync changes back. (source: @stitchbygoogle)

Claude Code Agent View Turns It Into a Fleet (4 mentions)

Claude Code shipped claude agents, a single list of all running sessions. You can dispatch multiple agents in parallel, see what’s running versus waiting on you, and reply inline to unblock them without losing your place. The earlier /goal feature pairs with this — it keeps Claude working autonomously until the task is done rather than checking back at every step. Together they shift Claude Code from interactive assistant to something closer to a fleet of parallel workers. I’ve been running two sessions at a time and the multiplier is real, especially when one is exploratory and the other is editing. (source: @claudeai, @ClaudeDevs)

The Claude Code Harness Matters More Than the Model (1 mention)

Anthropic published a detailed post on running Claude Code at scale — multi-million-line monorepos, legacy systems, distributed architectures. The argument is the part to internalize: the configuration layer around the model (CLAUDE.md files at root and subdirectory level, hooks, skills, plugins, LSP integrations, MCP servers, subagents) determines the quality of what you get more than which model you’re using. Claude doesn’t use RAG or a codebase index. It navigates live files like a developer would, which means no stale indexes but also that your CLAUDE.md layering does the heavy lifting. The bit most people miss is that hooks can reflect on a session and propose updates to your own CLAUDE.md, so the setup is self-improving over time. Developers running this discipline are reporting large drops in mistake rates across long-running projects. (source: @ClaudeDevs)

Honorable Mentions

Cursor Composer 2.5 is Cursor’s most capable model yet, built on Moonshot’s Kimi K2.5. Cursor claims roughly 10x cost-per-capability efficiency and benchmarks at Opus 4.7 parity (vendor-reported). Usage limits doubled for a week. (source: @cursor_ai)
Browserbase launched browse.sh, an open-source catalog of skills for navigating specific websites reliably from an agent. Ramp, Lovable, Interaction, and Reducto contributed verified skills for their own platforms. Free. (source: @browserbase)
Anthropic acquired Stainless, the SDK and MCP server generation platform that has built every official Anthropic SDK since the earliest API days. Framed explicitly around expanding Claude’s agent connectivity. (source: @AnthropicAI)
Prompt cache diagnostics shipped in Claude Console — when a request misses the cache, you can now see exactly which part of your prompt changed and what the token cost was. Worth a look if you’re spending real money on the API. (source: @ClaudeDevs)
Claude Code Learning Mode (Settings → Output Style → Learning) makes Claude explain every decision while it works. Lydia Hallie endorsed it as her daily driver on side projects. A real counter to the skill-atrophy concern. (source: @lydiahallie)
ChatGPT personal finance launched for U.S. Pro users. Connect your bank and credit card accounts, then ask about your own spending. The most practical new consumer AI feature of the month. (source: @ChatGPTapp)
OpenAI and Google agreed on something: DALL-E images now carry both C2PA Content Credentials and Google’s SynthID watermark, with a public verification tool. Real provenance for AI-generated images. (source: @OpenAI)
Ramp’s spend data is now queryable inside Claude, ChatGPT, Bloomberg, Perplexity, and Grok. Ask any of them what mid-market companies are paying for a software category and you get answers grounded in real corporate spend. (source: @tryramp)
KPMG deployed Claude to all 276,000 employees via its Digital Gateway platform. Not a pilot — full organizational rollout. The reference case for what enterprise AI looks like at scale. (source: @DeItaone)
OpenAI’s Codex Windows sandbox writeup is candid about the design constraints: Windows lacks Linux’s lightweight isolation primitives. The takeaways apply anywhere you’re running local agents. (source: @OpenAIDevs)
End-to-end encryption for RCS between Android and iPhone is rolling out automatically. Not AI news, but probably the most practically significant thing that happened this week for most people. (source: @sundarpichai)

Try This Weekend

For everyone:

Try Gemini 3.5 Flash in AI Studio. Run the same coding or reasoning task you usually hand to GPT-5.5 or Claude and see how the speed and quality compare. The sub-200ms responses change how you actually interact with it.
Connect a financial account to ChatGPT if you’re a U.S. Pro user. Ask “where did most of my money go last month?” The accuracy on your actual data is the part that surprises people.
Point at something with Google’s AI pointer experiments. Hover a recipe and say “double these ingredients” or point at a building and say “show me the route.” The interface shift from describing to pointing is a real thing.
Verify an AI image with OpenAI’s provenance tool. Generate something in DALL-E and run it through the public verifier. Understanding the watermarking flow now will matter when you need it later.

For developers:

Stand up a self-hosted Claude Managed Agents sandbox on Cloudflare, Modal, or Vercel using the cookbook. Even if you’re not deploying it to production, going through the setup is the fastest way to understand what changes about your agent architecture.
Turn on Claude Code Learning Mode for one side project session. Settings → Output Style → Learning. It’s slower; you’ll understand more.
Generate a DESIGN.md for your product in Stitch. Import from a codebase or live website, takes under 10 minutes. The output is useful well beyond Stitch — any AI tool can read it.
Install Hermes Agent v0.14.0 and run hermes auth add xai-oauth if you already pay for X Premium+. Real-time X search and Grok models inside your local agent stack at no additional cost.

AI for Everyone#

Gemini 3.5 Flash Goes Live, Three Times the Price (7 mentions)#

Gemini Omni, Spark, an AI Pointer, and a New $100 Ultra Tier (16 mentions across items)#

Karpathy Joins Anthropic (5 mentions)#

Anthropic Passes OpenAI in Business Adoption (2 mentions)#

Apple’s M5 Memory Integrity Enforcement Bypassed Using Anthropic’s Mythos (2 mentions)#

The Token Number Behind Everything (1 mention)#

AI for Developers#

Hermes Agent v0.14.0 Adds OAuth Grok Access (15 mentions)#

Claude Managed Agents: Self-Hosted Sandboxes and MCP Tunnels (6 mentions)#

Stitch’s DESIGN.md Standard and Streaming Canvas (6 mentions)#

Claude Code Agent View Turns It Into a Fleet (4 mentions)#

The Claude Code Harness Matters More Than the Model (1 mention)#

Honorable Mentions#

Try This Weekend#