X is the best way I’ve found to keep up with AI. I like tweets throughout the week, filtering for things I think are actually worth knowing, then use Claude Code to pull those likes automatically and help me turn them into this post (here’s how the pipeline works). This week: 287 tweets liked, filtered down to what’s below.
Check out the previous roundup (May 19) if you missed it. A lot of last week’s Google I/O dust has settled, so this issue leans toward what’s new since.
AI for Everyone
Anthropic became the most valuable private tech company in the world this week. Apple’s plan to rebuild Siri around someone else’s model leaked. NVIDIA showed a laptop chip that runs 120-billion-parameter models locally. And three labs cracked open famous unsolved math problems within days of each other, which matters more for software than for math.
Anthropic Raises $65B at a $965B Valuation, Files Confidential S-1 (3 mentions)
Anthropic closed a $65B Series H led by Altimeter, Dragoneer, Greenoaks, and Sequoia at a $965 billion post-money valuation, and the same week confidentially filed a draft S-1 with the SEC. Run-rate revenue is reportedly near $47B annually, mostly enterprise. That revenue figure is the part I’d anchor on, because it means the valuation is priced off a business that already exists rather than a forecast. The thing to plan around: a public Anthropic answers to quarterly earnings, so pricing, model cadence, and the “safety-first” positioning all start moving on public-company timelines within a year or so. (source: @AnthropicAI)
iOS 27 Rebuilds Siri Around Third-Party AI (3 mentions)
Bloomberg’s Mark Gurman reports that iOS 27, due at WWDC on June 8, rebuilds Siri with Google Gemini running the core and a dropdown to swap in Claude, ChatGPT, or Grok. Apple spent two years trying to build this in-house and landed on renting someone else’s model, which tells you how hard frontier modeling is even for the company with the most resources. Default placement on a billion iPhones is the most valuable real estate in consumer tech, so whichever model feels best inside Siri gets an enormous distribution edge. I’ll be watching the June 8 keynote for exactly how the switcher works. (source: @markgurman)
NVIDIA RTX Spark Runs 120B Models on a Laptop (4 mentions)
At Computex 2026, NVIDIA revealed RTX Spark, a laptop superchip pairing a Blackwell GPU with a 20-core Grace CPU and up to 128GB of unified LPDDR5X memory. It runs 120-billion-parameter models locally with a 1M-token context window and no internet connection. A year ago, 128GB of memory on a laptop part was not a serious spec. Machines from ASUS, Dell, HP, Lenovo, Microsoft Surface, and MSI ship in Fall 2026, and if you’re buying hardware for local model work this year, this is the one worth waiting for. (source: @NVIDIARTXSpark, @satyanadella)
AI Cracks Open Famous Erdős Problems (4 mentions)
Three labs solved hard Erdős combinatorics problems in the same week. Google DeepMind cleared nine open problems using an LLM-plus-Lean loop that formally verifies each proof before a human reads it, and Anthropic’s Mythos found what one researcher called a “cute, simple proof” of a result OpenAI had earlier needed 125 pages to establish. The same-week clustering is the real signal: a capability threshold got crossed, not a lucky run. The reason this belongs in a general roundup is that the identical verify-as-you-go loop is already pointed at software security, which is the next section’s lead. (source: @MTSlive, @prz_chojecki)
Robinhood Opens Agentic Trading (2 mentions)
Robinhood launched “Agentic Accounts,” where you connect an AI agent that can explore trade ideas, build and rebalance a portfolio, and place trades on your behalf. The agentic credit card is the tamer cousin and is live for everyone; the trading piece is still rolling out. Letting an agent actually execute trades is a line most platforms have been nervous to cross, and the “your strategy, your agent” framing is doing a lot of legal work here. I’d treat it as a beta experiment, not a replacement for judgment. (source: @vladtenev)
AI for Developers
Opus 4.8 and dynamic workflows were the headline, but the more durable shifts were quieter: MCP is going stateless, there’s finally a real login standard for agents, and Anthropic’s security model found ten thousand live vulnerabilities. Google’s Antigravity team also built a working operating system from scratch in 12 hours using 93 parallel subagents for under $1,000, then booted Doom on it on stage. It’s the same parallel-agent idea Claude Code just shipped, which tells you the approach is converging across labs.
Claude Opus 4.8 and Dynamic Workflows (20+ mentions)
Anthropic shipped Opus 4.8 and dynamic workflows together. The model posts 69.2% on SWE-bench Pro (Anthropic-reported, up from 64.3% on Opus 4.7) at the same price as 4.7, and it’s noticeably more willing to flag that it’s unsure instead of declaring broken code finished. Dynamic workflows is the bigger deal: type “workflow” in a Claude Code prompt and the model writes its own orchestration script, then fans out to dozens or hundreds of parallel subagents, each running in its own context so the intermediate grep noise never lands in your main thread. One Anthropic engineer used it to audit hundreds of A/B test flags in under ten minutes. It’s a research preview and it burns tokens quickly, so start with something scoped. (source: @ClaudeDevs, @bcherny)
Anthropic’s Glasswing Finds Over 10,000 Vulnerabilities (3 mentions)
Project Glasswing, Anthropic’s security effort built on Claude Mythos, has found more than 10,000 high or critical severity vulnerabilities in widely used software since launch. The capability that separates it from a scanner is chaining: Mythos takes several low-severity issues that each look manageable, combines them into a working exploit, compiles it, runs it, and retries when it fails. Cloudflare pointed it at 50-plus of their own repos and called it a real step forward. Anthropic’s own conclusion is the uncomfortable one. Patching faster is the wrong response if your tests are too slow to ship a correct patch, so regression coverage matters more than patch velocity. (source: @AnthropicAI)
MiniMax M3, an Open-Weights Frontier Model (5 mentions)
MiniMax M3 is the open-weights launch most people missed under the Opus 4.8 noise. MiniMax reports 59% on SWE-Bench Pro (roughly GPT-5.5 territory), a 1M-token context window via their Sparse Attention design, and native multimodal from the ground up, which they say is a first for open weights. The API is live now at a 50% launch discount, and the weights land in about ten days. One developer said they’ve run a 24/7 agent on it since launch without crossing 50% of quota, which is a pointed contrast with the Claude Code rate limits people were hitting this week. Worth grabbing API access before the weights drop and everyone benchmarks it properly. (source: @MiniMax_AI, @SkylerMiao7)
MCP Goes Stateless, Plus NSA Guidance (3 mentions)
The MCP 2026-07-28 release candidate makes the protocol stateless: no handshake, no session ID, any request can hit any server instance. That’s the prerequisite for running MCP servers behind a load balancer at real scale, and it’s the largest change since MCP launched. Extensions like MCP Apps and Tasks are now first-class, and there’s finally a deprecation policy. The same week, the NSA published a security design guide for MCP, focused on privilege escalation and prompt injection through tool calls. If you run session-dependent MCP servers, read the RC now, because you’ll be rearchitecting. (source: @dsp_, @NSACyber)
auth.md, a Login Standard for Agents (4 mentions)
Right now most agents log into apps by impersonating a human, clicking through OAuth consent screens and fighting CAPTCHAs. auth.md from WorkOS proposes the clean version: publish a Markdown file at a known URL on your domain that tells agents how to register, which scopes exist, and how to prove the user approved. It’s human-readable and machine-readable, composes with existing OAuth, and WorkOS made it explicitly open rather than tied to their own product. Cloudflare and Firecrawl are launch partners. This is the kind of boring infrastructure that becomes a quiet standard and then everyone is expected to have it. (source: @grinich)
Claude Code Gets a Security Plugin (3 mentions)
Anthropic shipped a security plugin for Claude Code that runs hooks at three points: on file edits to catch risky patterns early, after each model turn for a full diff review, and at commit time with surrounding code context. They report internal use cut security-related PR comments by 30 to 40%, and since they claim 90% of their own code is Claude-written, they have real incentive to get this right. It’s free for every Claude Code user; install it from /plugins. (source: @ClaudeDevs)
Honorable Mentions
- ElevenLabs Music v2 handles mid-track genre transitions, fast rap, and embedded sound effects, and ElevenCreative now lets you download tracks for ads and video without the stock-music licensing dance. (source: @ElevenLabs)
- Koji is an AI tutor built to coach students through problems instead of solving them, with real-time screen awareness so it can point and sketch like a tutor beside you. Built with input from MIT and Harvard learning researchers. (source: @suekhim)
- Kirkland & Ellis is building a $500M internal AI platform rather than rely on shared tools, betting that a model trained on its own cases and precedents beats general-purpose AI for legal work. (source: @Techmeme)
- Slack MCP hit 1 million users in six weeks, with agent workloads up 350% quarter over quarter. Slackbot is now an MCP client, so “update Jira 4821 to in progress” just works from a thread. (source: @SlackHQ)
- DeepSWE from Datacurve is a coding benchmark built from scratch rather than scraped from GitHub issues, with short prompts and harder tasks, designed to surface real differences between models that look similar on SWE-bench. (source: @theo)
- Cursor’s Team Kit put its internal
/thermo-nuclear-code-quality-reviewskill in the marketplace. It deletes complexity instead of moving it, blocks files over 1,000 lines, and rejects PRs that work but make the codebase messier. (source: @ericzakariasson) - WebMCP enters a Chrome 149 origin trial. It lets sites expose structured tools to browser agents so they know exactly how to interact, and Lighthouse is adding an experimental AI-discoverability audit. (source: @ChromiumDev)
- xAI’s Composer 2.5 landed in Grok Build with 200k context, subagents, and
.cursorsettings compatibility, andgrok-build-0.1is on the API at $1/M input, $2/M output. (source: @xai) - OpenAI added private MCP tunnels so internal MCP servers reach ChatGPT, Codex, and the Responses API over outbound-only HTTPS with no public exposure. Both labs shipping this in one month makes private MCP table stakes for enterprise. (source: @OpenAIDevs)
- Anthropic reset Claude Code rate limits for all Pro and Max users after a bug caused runaway parallel-subagent spawning. The fix is live, and the lesson is that dynamic workflows are genuinely expensive, so scope before you let one loose. (source: @ClaudeDevs)
Try This Weekend
For everyone:
- Generate a track in ElevenMusic for a piece of content you’d normally license stock music for. The jump from v1 is large enough to reconsider your workflow.
- Put Koji in front of a student on one homework problem. The real test is whether it coaches the thinking or just hands over the answer.
- Connect an agent to a Robinhood agentic account with money you can afford to lose and watch what it proposes before you let it act.
- Block out time for the WWDC keynote on June 8 to see how the new Siri model switcher actually works in practice.
For developers:
- Type “workflow” in Claude Code with Opus 4.8 on a scoped task, like a bug hunt across one service or a small migration, and watch it write a plan and fan out. Budget 30 minutes and keep it bounded.
- Install the Claude Code security plugin from
/plugins, then run it against a PR you’d normally ship and see what surfaces. - Publish an auth.md on an app you own. It’s a short implementation and it puts you ahead of the curve before agent-first auth is the default.
- Grab MiniMax M3 API access at the 50% launch discount before the weights drop and pricing normalizes.
- Run Cursor’s Team Kit thermo-nuclear code review on your next PR before it goes out.
