X is the best way I’ve found to keep up with AI. I like tweets throughout the week, filtering for things I think are actually worth knowing, then use Claude Code to pull those likes automatically and help me turn them into this post (here’s how the pipeline works). This week: 239 tweets liked, filtered down to what’s below.
Check out the previous roundup (June 1) if you missed it. This was supposed to be the week of the Fable 5 launch. It turned into something stranger.
AI for Everyone
The best model release in recent memory lasted four days before the US government pulled it off the market. Anthropic also published internal data openly raising the question of whether AI is starting to build AI. And Cloudflare quietly confirmed that most web traffic is no longer human.
Claude Fable 5 Launches, US Government Shuts It Down Four Days Later (15+ mentions)
Anthropic released Claude Fable 5, the publicly available version of its Mythos-class model wrapped in safety guardrails, and the reception was the most positive I’ve seen for any model launch. Karpathy called it “a major-version-bump-deserving step change forward,” Stripe reported compressing two months of migration work on a 50-million-line Ruby codebase into a day, and it was included free on Pro and Max plans through June 22. Then, four days in, the US government issued an export control order citing national security, directing Anthropic to cut off every foreign national, including its own employees. Anthropic disabled Fable 5 and Mythos 5 for everyone to comply, and publicly disagrees with the order, warning that this standard could stall frontier model deployments across the whole industry. Claude.ai falls back to Opus 4.8 for now. Whatever the resolution, governments just demonstrated they can switch off a frontier model overnight, and that precedent outlasts this episode. (source: @claudeai, @karpathy, @AnthropicAI)
Anthropic Publishes Its Own Recursive Self-Improvement Data (7 mentions)
Anthropic released internal metrics tracking how much its own models accelerate its own work. The headline number: on an internal coding optimization benchmark, Claude Mythos Preview achieved a 52x speedup, up from 3x for Claude 3 Opus in 2024, on the same test. Claude now writes more than 80% of Anthropic’s production code, and Mythos Preview beats human researchers on research judgment calls 64% of the time. Most companies would sit on data like this; Anthropic published it with an essay explicitly raising recursive self-improvement and launched the Anthropic Institute to study the implications. The 52x figure is the one I’d hold onto for the next year. (source: @AnthropicAI, @MTSlive)
AI Agent Traffic Passes Human Traffic (3 mentions)
Cloudflare Radar data confirms AI agents and bots now generate more HTML page requests than humans do, globally. This wasn’t supposed to happen for years, and it happened without a press conference. If you run a website, your analytics now measure a minority of your visitors. The rest of your traffic reads your pages without caring how they look, which is an argument for spending your next sprint on structured data and APIs instead of homepage polish. (source: @SemiAnalysis_, @dee_bosa)
Apple Relaunches Siri as “Siri AI” With On-Screen Awareness (4 mentions)
Apple rebranded Siri as Siri AI with a dedicated app, full on-screen context awareness, and visual intelligence in the Camera app. The two big caveats: it requires an iPhone 17 Pro, and it’s still blocked in the EU. The quieter news for developers is Core AI, a framework that runs models like Qwen, Mistral, and SAM3 natively on Apple silicon with no server calls and no per-user API costs. For indie iPhone apps, that means shipping AI features without a cloud bill. (source: @MTSlive, @akshay_pachaar)
Google AI Plus Drops to $4.99, NotebookLM Learns to Research (4 mentions)
Google cut Google AI Plus from $7.99 to $4.99 a month and doubled storage to 400GB, bundling Gemini Ultra and NotebookLM Plus. The timing matters because NotebookLM just got a lot more useful: it can now search the web beyond your uploaded sources, run multi-step agentic research, and export real deliverables like PDFs, spreadsheets, and presentations (the research features are reaching Google AI Ultra subscribers first). Before this update it summarized what you gave it. Now it can do the finding too. At $4.99 I’d subscribe for the upgraded NotebookLM alone. (source: @DynamicWebPaige, @NotebookLM)
AI for Developers
Fable 5’s four-day window dominated the conversation, but the structural story is pricing: open models matched frontier quality at a tenth of the cost this week, and OpenRouter built a product strategy around helping you spend less. Meanwhile the first real attack on agent-driven coding pipelines showed up, and it worked exactly the way you’d fear.
Claude Code’s /fork Redesign and Nested Subagents (6 mentions)
Claude Code redesigned /fork: it now launches a background agent with your exact session state, including system prompt, tools, history, and the prompt cache, so the expensive context-building step happens once and subagents reuse it. Adam Wolff at Anthropic pointed out that subagents can now do serious work from very short prompts because the context is already there. Nested subagents also went live up to depth 5, so agents can spawn agents that spawn agents. It’s plumbing rather than a headline feature, but plumbing is what decides which multi-agent setups people actually build. The old /fork behavior moved to /branch. (source: @bcherny, @dmwlff)
The AI Price War Is Here (6 mentions)
NVIDIA’s Nemotron 3 Ultra, a 550B-parameter open MoE model, matched GPT-5.5 output quality on creative coding tasks at a tenth the cost in atomic.chat’s independent benchmark ($0.051 versus $0.57 per task), and it’s free right now on OpenCode with 1M context. OpenRouter declared a Cost Reduction Month, starting with an “Advisor” tool that lets cheap models consult a smarter model only when they get stuck. And the WSJ reports OpenAI is weighing dramatic price cuts. Against that backdrop, Fable 5 launched at $10/$50 per million tokens, roughly double Opus 4.8 pricing. The market is splitting into expensive frontier and commodity everything-else, and the quality gap between the tiers keeps narrowing. Audit your AI spend this week. (source: @atomic_chat_hq, @OpenRouter, @WSJ)
DiffusionGemma Is 4x Faster and 6x Wronger (8 mentions)
Google DeepMind released DiffusionGemma, a 26B diffusion language model that drafts 256-token blocks simultaneously instead of predicting one token at a time, running up to 4x faster than standard Gemma 4 and hitting 1200+ tokens per second on an H200. The catch: atomic.chat’s head-to-head benchmark found it made 28 factual errors where standard Gemma 4 made 5, inventing a fake Steve Jobs colleague and pricing the $1,600 BeBox at $9,999. Fluent wrong answers look exactly like fluent right ones, and Google said it plainly in the launch post: use regular Gemma 4 when facts matter. It’s a speed tool for pipelines that verify accuracy downstream rather than a general replacement. (source: @GoogleDeepMind, @atomic_chat_hq)
Gemma 4 12B Is the Local Model to Run (8 mentions)
Google’s Gemma 4 12B handles vision, audio, and 256K context in one unified model that fits in 16GB of VRAM, and Unsloth’s Dynamic GGUFs bring it down to 8GB of RAM at 162 tokens per second via Multi-Token Prediction. One developer running it for real-time sales call coaching on an M4 Max called it their new default recommendation at this size. It’s one command on Ollama: ollama run gemma4:12b-mlx. If you’ve wanted a local multimodal model that doesn’t need a dedicated server, I think this is currently it. (source: @UnslothAI, @JulianPscheid)
Microsoft Ships Seven MAI Models, No Distillation (6 mentions)
Microsoft launched a seven-model MAI suite, all trained without distilling from third-party models. MAI-Thinking-1 is the flagship reasoner: 35B active parameters, 256K context, 97% on AIME 2025, 53% on SWE-Bench Pro. The one to watch is MAI-Code-1-Flash, a 5B-parameter model scoring 51% on SWE-Bench Pro and tuned specifically for VS Code and GitHub Copilot CLI. That’s a Haiku-sized model at near-frontier coding performance, wired directly into Copilot. The launch mostly got buried under Fable, which is a shame because it’s Microsoft’s most serious independent model work yet. (source: @mustafasuleyman, @scaling01)
Fake Sentry Alerts Are Now an Attack Vector for Coding Agents (1 mention)
Someone sent forged Sentry bug alerts to apps, crafted so that a coding agent auto-responding to the “bug” would install a malicious npm package that exfiltrates environment variables. All an attacker needs is a project’s public DSN. This exploits exactly the workflow everyone is building toward, agents that act on alerts without human review, and nobody has a playbook for it yet. If you’ve automated alert-to-fix loops, restrict which packages your agent can install and audit any install triggered by an error report. (source: @sergeykarayev)
Hermes Desktop 1.0: a Local Agent That Does Things (6 mentions)
Nous Research shipped Hermes Desktop 1.0, a native macOS, Windows, and Linux app for their multi-agent system. It runs on local models via Ollama, makes voice calls through ElevenLabs, and now has a production-grade WhatsApp Business Cloud integration with webhooks, media, and interactive approval buttons. NVIDIA featured it on stage at Computex. Install is one command: ollama launch hermes-desktop. A persistent agent that runs on your machine all day, answers WhatsApp, and makes calls is no longer a demo. (source: @NousResearch, @ollama)
Honorable Mentions
- Kimi K2.7-Code went open-source with a 21.8% coding improvement over K2.6 and 30% fewer reasoning tokens, which adds up fast if you run high-volume coding agents. (source: @Kimi_Moonshot)
- OpenAI filed a confidential S-1, two weeks after Anthropic did the same. When these go public we finally get real revenue and burn numbers instead of press-release guesses. (source: @OpenAINewsroom)
- Shopify’s internal “Quick” tool gives AI-generated HTML apps a zero-config API for data, files, AI, and websockets on a $200/month VM. Expect “backend for vibe-coded apps” to become a product category. (source: @pushmatrix)
- Microsoft Project Solara is agent-first hardware: a wearable badge and desk device built around delegating tasks instead of opening apps. (source: @MTSlive)
- Gemini 3.5 Live Translate does real-time speech-to-speech translation across 70+ languages. The latency question will decide whether it’s magic or a demo. (source: @Google)
- Dreambeans, from Google Labs, surfaces a daily personalized collection from your Google apps with no engagement algorithm. Their pitch: “hope scrolling, not doom scrolling.” US Google AI Ultra subscribers only. (source: @GoogleLabs)
- Google Search Console is rolling out an AI performance report showing when your pages appear in AI Overviews and AI Mode, impressions only for now. (source: @brodieseo)
- Ahrefs analyzed 1B+ data points on AI search: “best X” listicles are 44% of ChatGPT citations, YouTube mentions correlate with AI visibility more than backlinks do, and AI Overviews now cut clicks to the #1 Google result by 58%. (source: @timsoulo)
- Justin Drake of the Ethereum Foundation puts 50% odds on a quantum computer breaking live crypto by 2032, and Ethereum is migrating to post-quantum cryptography with a 2029 target. Not AI news, but if you hold crypto long-term, worth tracking. (source: @drakefjustin)
Try This Weekend
For everyone:
- Point NotebookLM at a real research question and let the new web search and agentic research do the source-finding for you. Export the result as a deck or spreadsheet.
- Try Siri AI’s on-screen awareness (iPhone 17 Pro) on something you’d normally screenshot and paste into another app.
- Use Gemini 3.5 Live Translate for a conversation or video in a language you don’t speak.
- Swap one social media session for Dreambeans if you’re a US Google AI Ultra subscriber, and see if “hope scrolling” holds up for a week.
For developers:
- Run
ollama run gemma4:12b-mlxand throw your hardest multimodal prompt at Gemma 4 12B. Vision, audio, and 256K context on 8GB of RAM. - Use /fork in an active Claude Code session to spin off a background subagent for a parallel task, and watch the prompt cache reuse in your token count.
- Try Nemotron 3 Ultra free on OpenCode for your standard coding workflow before defaulting to a paid model.
- Install Hermes Desktop with
ollama launch hermes-desktopand wire a personal WhatsApp bot to a local model. - Audit your alert-to-agent loops. If anything can auto-install a package in response to a Sentry alert, lock that down this weekend, not later.
