AI weekly: GPT-5.5 Instant, Claude agent dreaming, Firefox kills 271 bugs in a month

X is the best way I’ve found to keep up with AI. I like tweets throughout the week, filtering for things I think are actually worth knowing. I use Claude Code to pull those likes automatically and help me turn them into this post (here’s how the pipeline works). This week: 148 tweets liked, filtered down to what’s below.

Check out the previous roundup (May 2) if you missed it.

AI for Everyone

GPT-5.5 Instant Becomes the ChatGPT Default for Everyone (5 mentions)

OpenAI started rolling GPT-5.5 Instant on May 5 as the default model for every ChatGPT user (also available in the API as gpt-5.5-chat-latest). The pitch is meaningful improvements in factuality, especially in medicine, law, and finance, plus a noticeable shift toward shorter, less padded answers. The memory update is the part worth digging into: ChatGPT can now use your saved memories, past chats, files, and a connected Gmail to personalize responses, and a new “memory sources” panel shows exactly what it pulled from so you can edit or delete entries. A full-duplex voice mode is coming, where the model can listen and speak at the same time. The companion OpenRouter cost analysis is the catch: short-prompt workloads are about 92% more expensive on GPT-5.5 than GPT-5.4. (source: @OpenAI, @OpenAI, @OpenAI)

Coinbase Cuts 14% and Says the Quiet Part Out Loud (2 mentions)

Brian Armstrong’s company-wide Coinbase layoff email is the clearest public attribution of headcount cuts to AI I’ve seen from a major tech company. He didn’t blame market conditions. He said directly: non-technical teams are now shipping production code, workflows are being automated, and the pace of what a small team can do has changed. The structural changes are the interesting part: AI-native pods, “one person teams” doing engineering plus design plus product, a flat org of five layers max, and every leader required to be an individual contributor. Marc Andreessen chimed in to say most big companies are overstaffed by 2-4x, and AI is what’s finally forcing the issue. (source: @brian_armstrong)

Google Launches an AI Health Coach, a $99 Screenless Fitbit, and a 14,000-Patient Study (3 mentions)

Google is making a serious AI push into health. The Fitbit app is becoming Google Health, with a Gemini-powered AI Coach that pulls from your fitness data, sleep, cycle tracking, weather, and U.S. medical records. The companion hardware is Fitbit Air, a $99 screenless band aimed at Whoop with seven-day battery and a five-minute charge that gets you through a day. The most interesting drop is the research alongside it: Google scientists tested an AI symptom checker on 14,000 real patients over nine months. Clinicians ranked the AI’s diagnosis #1 about 53% of the time, versus 24% for independent physicians. The structured AI interview beat passive symptom entry by 27% in diagnostic accuracy, which is a pointed critique of every consumer AI health product that just waits for you to describe symptoms. Fitbit also picked up physiological signals of illness days before patients felt sick. (source: @kimmonismus, @kimmonismus)

Kevin Rose Quietly Relaunches Digg as an AI News Aggregator (3 mentions)

Digg is back, and it’s specifically an AI news aggregator now. The alpha is live at di.gg. The plumbing is more interesting than the name: 9 million graph connections, more than 15 AI judges ranking stories, real-time X ingestion, and influence-flow tracking (so when Sam Altman reposts something, it can watch the discussion ripple from there). The first Digg died partly because it tried to out-Reddit Reddit. This version has a tighter audience and a specific beat. Matt Van Horn already shipped a Digg CLI with Claude Code, OpenClaw, and Hermes skills if you’d rather drive it from a terminal than a browser. (source: @kevinrose, @mvanhorn)

Apple’s Camera AirPods Hit Late-Stage Testing (2 mentions)

Mark Gurman confirmed Apple’s camera-equipped AirPods are in design verification testing, with features and design nearly locked. Both earbuds carry cameras that capture low-resolution visual data, not photos or video, so Siri can answer “what is this thing I’m looking at?” The design is described as AirPods Pro 3 style with longer stems and an LED that lights up when visual data is sent to the cloud. The reported September target depends on Siri’s rebuild hitting Apple’s quality bar, and that rebuild is reportedly being powered by Gemini under the hood. Four years of development for a product that lives in your ears and watches what you watch. Not what most people meant when they said Apple was behind on AI. (source: @markgurman, @kimmonismus)

Anthropic and Wall Street Form a $1.5B Joint Venture (3 mentions)

The Wall Street Journal reported Anthropic is finalizing a roughly $1.5 billion joint venture with Blackstone, Goldman Sachs, and Hellman & Friedman, aimed at selling Claude-based workflow transformation to private-equity-owned companies. Anthropic, Blackstone, and H&F each put in about $300M; Goldman invests $150M. The pitch on PE is that they own a lot of companies, push cost changes fast, and can force software adoption that a normal enterprise IT org would slow-walk for years. The structure is the tell: this isn’t a reseller arrangement, the JV is supposed to rebuild workflows, not hand over API keys. Anthropic is entering professional services, competing with the consulting firms that have been racing to build their own AI practices. (source: @AndrewCurran_, @AndrewCurran_)

An Anthropic Co-Founder Puts 60% Odds on Recursive Self-Improvement by 2028 (1 mention)

Jack Clark, Anthropic’s co-founder, tweeted that after a few weeks of reading hundreds of public data sources on AI development, he now puts a 60% probability on recursive self-improvement happening by the end of 2028. RSI is the scenario where AI systems get good enough to build their own successors. Most timelines for “AI gets weird fast” run through that door. It’s one tweet, no paper attached, but the source is unusual: Clark is on the inside and not a hype account. Nobody actually knows what RSI looks like from the outside, so a 60% probability is more of a planning prompt than a forecast. Still, worth letting it shape your 2026-2028 product and career decisions. (source: @jackclarkSF)

AI for Developers

Code with Claude landed on May 6 and ended up being the highest-density Anthropic announcement of the year so far: Managed Agents got Dreaming, Outcomes, and multi-agent orchestration; Claude Code rate limits doubled overnight; the SpaceX compute deal got announced. OpenRouter shipped the most-mentioned developer story of the week (response caching), and May 7 turned into an accidental voice-AI convergence day. Mozilla’s case study on Claude Mythos is the cleanest before/after data point I’ve seen all year.

Claude Managed Agents Get Dreaming, Outcomes, and Multi-Agent Orchestration (6 mentions)

Anthropic’s biggest platform update in months dropped at Code with Claude. Dreaming (research preview, waitlisted) is a scheduled process that reviews an agent’s past sessions, finds patterns in what went wrong, and curates its own memories so it stops repeating mistakes. Harvey’s team tested it and reported completion rates went up roughly 6x. Outcomes (public beta) lets you write a rubric; a separate grader Claude evaluates every run against it and the agent iterates until it clears your bar. Anthropic’s own tests show up to 10 percentage points better task success, 8.4% better on docx generation, 10.1% on pptx; Wisedocs reports their reviews now run 50% faster while staying aligned with team standards. Multi-agent orchestration (public beta) lets a lead agent delegate to specialists running in parallel on a shared filesystem, with full visibility in the Console. Spiral’s early implementation uses Claude Haiku as the cheap orchestrator and Claude Opus as the drafting specialist, which is the cost play to copy. Webhooks also landed, and infinite context windows were teased from stage. (source: @claudeai, @claudeai, @claudeai, @ClaudeDevs)

Claude Code Rate Limits Doubled, Compute Deal With SpaceX (3 mentions)

Alongside the agents launches, Anthropic quietly doubled Claude Code’s 5-hour rate limits for Pro, Max, and Team plans, removed the old peak-hours throttle on Pro and Max, and substantially raised API rate limits for Opus models. The capacity came from a new compute partnership with SpaceX. Boris Cherny, who leads Claude Code, mentioned in passing that the day of the announcement was Claude Code’s second-biggest signup day ever, and the product has grown 15x since January 1. (source: @claudeai, @claudeai, @bcherny)

OpenRouter Ships Free Response Caching With One Header (8 mentions)

OpenRouter’s response caching is the most-mentioned dev story of the week, and the use case clicks immediately once you see it. Add one header (X-OpenRouter-Cache: true) and identical requests come back in 80-300ms, zero tokens billed, never hits the provider. The obvious application is agent retries: when a multi-step workflow fails halfway through and restarts from the top, every call up to the failure point returns instantly and costs nothing. Test suites are the other big one (first run populates, every subsequent run is deterministic and free). For context, an uncached Gemini 3.1 Flash call takes about 1.3 seconds, Kimi K2.6 about 4.6, and GPT-5.5 around 9.1. TTL is configurable from one second to 24 hours via X-OpenRouter-Cache-TTL, and cache hits don’t count against provider rate limits. This is distinct from prompt caching; both can run at the same time. (source: @OpenRouter, @OpenRouter, @OpenRouter)

Voice AI Convergence Day: OpenRouter Audio, GPT-Realtime-2, ElevenLabs Cuts Prices (6 mentions)

May 7 was unexpectedly large for voice AI. OpenRouter shipped audio endpoints (/audio/speech and /audio/transcriptions) with provider fallback routing across OpenAI GPT-4o Mini TTS, Google Gemini Flash TTS, and Mistral Voxtral Mini TTS for synthesis, plus Whisper, GPT-4o Transcribe, Google Chirp 3, and Groq’s Whisper for transcription. Same API key, same billing. The same day, OpenAI launched GPT-Realtime-2, bringing GPT-5-class reasoning to voice. The concrete number from Scale AI’s Audio MultiChallenge leaderboard: instruction retention jumped from 36.7% to 70.8% APR versus GPT-Realtime-1.5. ElevenLabs cut their own STT/TTS prices about 55% the same day. If you’ve been waiting to build voice agents because the tooling was clunky, that excuse is gone. (source: @OpenRouter, @OpenAIDevs, @ScaleAILabs)

Firefox Fixed 271 Security Bugs in April Using Claude Mythos (2 mentions)

Mozilla published a detailed post on what Claude Mythos Preview actually did to Firefox security in April: 271 bugs found and fixed via Mythos, with 423 total security bugs shipped that month (180 rated sec-high, 80 sec-moderate, 11 sec-low). For context, Firefox normally fixes 20-30 security bugs per month. This isn’t a hand-wavy “AI helped” claim. The team built an agentic harness on top of their existing fuzzing infrastructure, parallelized it across VMs, and let Claude not just identify candidates but create reproducible test cases to validate the hypothesized vulnerabilities. One of the bugs was a 20-year-old XSLT issue (reentrant key() calls causing a hash table rehash that frees its backing store while a raw entry pointer is still in use). The line that matters for planning: the pipeline gets better every time the model upgrades, with no changes to the harness itself. This is one of the cleanest before/after case studies for agentic security work I’ve read. (source: @alexalbert__, @alexalbert__)

Codex Ships a Chrome Extension for Background Tab Work (4 mentions)

OpenAI shipped a Codex Chrome extension that lets Codex run in background browser tabs without taking over the browser. Per-site access controls are built in. The practical win for web developers is that Codex can do real cross-tab testing, gather browser context from multiple open tabs, and use DevTools in parallel while you keep coding. The old browser-takeover model is what killed my interest in OpenAI Operator a year ago. Background tabs are the right answer. (source: @Codex_Changelog, @OpenAIDevs, @testingcatalog)

Pareto Code Routes Your Coding Calls to the Cheapest Capable Model (4 mentions)

OpenRouter’s Pareto Code is a free experimental router for coding tasks. You set a min_coding_score parameter (a number from 0 to 1 based on Artificial Analysis benchmarks) and it picks the cheapest model that clears your bar. The Pareto frontier updates live on the page so you can see which model is currently the best value. Right now DeepSeek V4 Pro, GPT-5.4 Mini, and Gemini 3.1 Pro sit at the top. Nous Research’s Hermes agent already uses it for routing auxiliary tasks. The cool part is the frontier moves while your code stays the same. (source: @OpenRouter, @OpenRouter)

Hermes Agent v0.13 Ships With Autobrowse That Self-Optimizes 80% Cheaper (3 mentions)

Nous Research’s Hermes Agent v0.13.0 (“The Tenacity Release”) added multi-agent Kanban orchestration, goal enforcement via /goal, and extensible LLM providers. The interesting demo: Kyle Jeong showed Autobrowse reducing a browser automation task from 102 seconds to 35, dropping turns from 23 to 8, and cutting cost from $1.46 to $0.28 after two iterations. The trick is the agent figures out it can eval JavaScript directly on the page rather than clicking step by step, then saves that as a reusable skill. The honest caveat: Hermes is still very low level. One early user installed it and called it “way too low level for non-technical folks.” He’s right. If you’re already comfortable in agentic tooling, the Autobrowse numbers are reason enough to install. (source: @NousResearch, @kylejeong)

Honorable Mentions

Claude for Excel, Word, and PowerPoint hit general availability, with Claude for Outlook in public beta. Context now carries across Microsoft apps, so a conversation that starts in Excel follows you into Word. (source: @claudeai)
Gemini 3.1 Flash Lite went GA on May 7 at $0.25/M input and $1.50/M output on OpenRouter, with a 1M context window and selectable thinking. Gemini 3.2 Flash was also briefly visible in the app before being pulled back, with Google I/O on May 19. (source: @GoogleAIStudio, @kimmonismus)
NotebookLM Mind Maps got prompt-driven steering, renaming, and sharing. You can now scope a map to a specific topic or source instead of letting it auto-generate from everything. (source: @NotebookLM)
Grok 4.3 launched on the xAI API at $1.25/M input and $2.50/M output with a 1M context window. xAI claims it tops Artificial Analysis on instruction following and agentic tool calling, and Vals AI ranked it #1 on case law and corporate finance. (source: @xai, @elonmusk)
Anthropic’s leaked “Orbit” feature for Claude Cowork would connect Gmail, Slack, GitHub, Calendar, Drive, and Figma and generate proactive briefings without prompts. The developer gate is “tibro enabled” (orbit backwards). Not confirmed, watch the next event. (source: @WesRoth)
iOS 27 will let you pick Claude or Gemini instead of ChatGPT for Apple Intelligence. Opens the current OpenAI-only escalation path to multiple providers. (source: @MacRumors)
Google Finance Beta added AI-powered key moments that explain major price swings on 1-month+ stock charts and jump you directly to the relevant part of the earnings call. (source: @thefox)
OpenRouter’s GPT-5.5 cost analysis ran the actual numbers: ~92% more expensive on short prompts vs GPT-5.4, 49-69% more on medium/long prompts. The “fewer completion tokens on long prompts” partial offset only helps if your workload is mostly long context. (source: @OpenRouter)
Gemini agent for macOS is in development per 9to5Google. Uses Screen Access and Accessibility APIs to organize files, convert files to Google Sheets, batch-rename, and draft email summaries from meeting transcripts. (source: @9to5Google)
Claude Platform now supports keyless auth via your existing AWS, GCP, or Azure cloud identity. Anthropic flagged API key management as the #1 security concern they hear from customers. (source: @ClaudeDevs)
A new Gemini “Omni” video model leaked today, with @chetaslua posting what they claim is first output and calling it a “nano banana moment of video.” @kimmonismus is speculating it lands at Google I/O on May 19, possibly as Veo 3.1’s successor. Unverified, but the timing fits. (source: @chetaslua, @kimmonismus)
A reference to “Ultrafast mode” briefly appeared in the Codex GitHub repo before being deleted. The description: “The fastest available responses for latency-sensitive work.” Looks like an unintended push. (source: @AiBattle_, @testingcatalog)
OpenAI’s October 2025 employee secondary stock sale totaled $6.6B, per WSJ reporting via @KobeissiLetter. More than 600 current and former employees cashed out an average of $11M each. (source: @KobeissiLetter)

Try This Weekend

For everyone:

Open ChatGPT and look at memory sources the next time you ask something personal. The new GPT-5.5 default exposes exactly what context it pulled from your past chats, files, and Gmail to answer. Edit or delete anything that’s wrong.
Try Google Finance Beta on a stock you actually follow. Look at the AI “key moments” annotations on the 1-month chart. If you do this during an earnings week, it’ll save you the click-around to find the news.
Steer a NotebookLM Mind Map with a prompt. Upload a long document you’ve been avoiding (a research paper, a contract, a book chapter), generate a Mind Map, then add a steering prompt like “focus on the risks” or “show me the timeline.” Way more useful than the auto-generated version.
Bookmark Digg’s AI alpha. Spend ten minutes seeing if its clustered AI news feed beats your current habit of scrolling X manually. The new Digg has a more focused beat than the old one.

For developers:

Add X-OpenRouter-Cache: true to your existing OpenRouter test suite. Zero code changes beyond the header. First run populates, every subsequent run is free and deterministic. The 80-300ms cache hits are noticeable in CI.
Drop Pareto Code into one OpenRouter call by setting min_coding_score. Watch which model it picks. The free routing layer makes the cheapest capable model the default without you doing the work.
Install the Codex Chrome extension and grant it access to one site you frequently test against. Try a real cross-tab debugging task. The background-tab model is the part that matters; it doesn’t grab focus.
Set up one cron job that runs Claude Code on a recurring chore. Boris Cherny’s pattern: PR babysitter, CI fixer, or Twitter feedback clusterer every 30 minutes. The simplest configuration apparently just works.

AI for Everyone#

GPT-5.5 Instant Becomes the ChatGPT Default for Everyone (5 mentions)#

Coinbase Cuts 14% and Says the Quiet Part Out Loud (2 mentions)#

Google Launches an AI Health Coach, a $99 Screenless Fitbit, and a 14,000-Patient Study (3 mentions)#

Kevin Rose Quietly Relaunches Digg as an AI News Aggregator (3 mentions)#

Apple’s Camera AirPods Hit Late-Stage Testing (2 mentions)#

Anthropic and Wall Street Form a $1.5B Joint Venture (3 mentions)#

An Anthropic Co-Founder Puts 60% Odds on Recursive Self-Improvement by 2028 (1 mention)#

AI for Developers#

Claude Managed Agents Get Dreaming, Outcomes, and Multi-Agent Orchestration (6 mentions)#

Claude Code Rate Limits Doubled, Compute Deal With SpaceX (3 mentions)#

OpenRouter Ships Free Response Caching With One Header (8 mentions)#

Voice AI Convergence Day: OpenRouter Audio, GPT-Realtime-2, ElevenLabs Cuts Prices (6 mentions)#

Firefox Fixed 271 Security Bugs in April Using Claude Mythos (2 mentions)#

Codex Ships a Chrome Extension for Background Tab Work (4 mentions)#

Pareto Code Routes Your Coding Calls to the Cheapest Capable Model (4 mentions)#

Hermes Agent v0.13 Ships With Autobrowse That Self-Optimizes 80% Cheaper (3 mentions)#

Honorable Mentions#

Try This Weekend#