X is the best way I’ve found to keep up with AI. I like tweets throughout the week, filtering for things I think are actually worth knowing. I use Claude Code to pull those likes automatically and help me turn them into this post (here’s how the pipeline works). This week: 148 tweets liked, filtered down to what’s below.
Check out the previous roundup (Apr 24) if you missed it.
AI for Everyone
ElevenMusic Launches With Creator Payouts (7 mentions)
ElevenLabs shipped ElevenMusic, a platform where you discover independent artists, remix tracks, and create songs from a prompt. The economic model has evidence behind it: ElevenLabs has paid out $11M to voice creators through its voice library and is bringing the same revenue share to music. Over 4,000 artists are on the platform at launch, and the Eleven Album Vol. 2 dropped alongside it. The same week added a Voice Changer Skill for ElevenLabs Agents that transforms voice in real time while preserving emotion and timing. (source: @ElevenLabs, @ElevenLabsDevs)
Gemini Generates Real Files From a Chat Prompt (5 mentions)
Google Gemini now exports Docs, Sheets, Slides, Word, Excel, CSV, PDF, Markdown, LaTeX, TXT, and RTF straight out of a chat. No template upload required. Available globally to all Gemini users this week. Claude has had document generation for a while; Gemini hitting parity at Google’s distribution scale is the news. “Summarize this for me” and “give me the budget as an Excel file” are different products. (source: @sundarpichai, @joshwoodward)
Microsoft Word Legal Agent Edits Contracts in Track Changes (3 mentions)
Microsoft launched a Legal Agent inside Word, available in the US through the Frontier program. It analyzes complex documents and applies precise clause-level modifications using Word’s native Track Changes. Not a summary, actual redlines. Brad Smith pitched it as the legal-work equivalent of when computers first arrived in the 1980s. The smart positioning is that lawyers don’t switch tools; the agent shows up where they already work. (source: @WesRoth, @BradSmi)
Claude Connects to Blender, Fusion, Adobe, Ableton, SketchUp (3 mentions)
Anthropic shipped official MCP connectors for Blender, Autodesk Fusion, Adobe, Ableton, Affinity, Splice, SketchUp, and Resolume. You can debug a 3D scene from a chat, batch-modify every object in a Blender file, or build music sessions in Ableton through conversation. One demo generated 1,000 cyberpunk anime scenes in under an hour. Creative pros who’ve watched AI from the sidelines now have the “it works inside my actual tools” moment. (source: @minchoi, @claudeai)
Meta Buys a Robotics-AI Startup, Joins the Humanoid Race (3 mentions)
Meta acquired Assured Robot Intelligence, a startup building AI specifically for robots, as part of their humanoid push. Google has Gemini robotics. OpenAI’s earlier robotics partnership fell through but they’re clearly circling back. All three frontier labs are now seriously chasing physical AI. The white-collar wave got the loud headlines; the blue-collar wave is already in acquisition territory. (source: @business, @kimmonismus)
ChatGPT Images 2.0 Usage Up 50%, Bad-MS-Paint Prompt Goes Viral (2 mentions)
OpenAI says image usage is up 50%+ since ChatGPT Images 2.0 launched, and roughly 60% of daily image users are new to ChatGPT. Image generation is pulling in people who weren’t using the product before. The prompt everyone’s sharing is the “bad MS Paint” one: tell the AI to redraw your image like a kid drew it in MS Paint with a mouse. The quality ceiling has gotten high enough that the going-viral move is now making it look terrible on purpose. The 360° image viewer also rolled out on desktop this week. (source: @OpenAINewsroom, @arrakis_ai)
Anthropic ARR Jumps From $9B to $44B in Months (2 mentions)
SemiAnalysis reports Anthropic’s ARR hit $44 billion, up from $9B at the end of 2025. That’s about 389% growth in a few months, mostly driven by enterprise Claude adoption and Claude Code. The gross-margin number is the part that makes this sustainable: inference reportedly went from 38% to over 70%. Anthropic got more efficient at running the models even as usage scaled up. A company with $44B ARR and 70% margins can ship faster than most can match. Worth remembering every time Anthropic releases something new this year. (source: @kimmonismus)
AI for Developers
AISI’s evaluation of GPT-5.5 cyber capabilities is the most disruptive number I’ve seen in a benchmark this year, and Anthropic’s counter-move (Claude Security in public beta inside Claude Code) landed days later. The rest of this section is the cost-cutting and infrastructure work shipping in parallel.
GPT-5.5 Cracks a 32-Step Corporate Attack in 10 Minutes for $1.73 (8 mentions)
The UK AI Security Institute ran GPT-5.5 through a 32-step corporate intrusion that takes a skilled human about 20 hours. It completed the full chain twice in ten attempts, scored 71.4% on expert-level CTFs, and solved a 12-hour custom-VM reverse-engineering puzzle in 10 minutes 22 seconds for $1.73 in API costs. AISI also surfaced a universal jailbreak that worked across every malicious cyber query they tested. OpenAI is rolling out GPT-5.5-Cyber to defenders now, and Anthropic shipped Claude Security in public beta inside Claude Code the same week, so you can scan a repo and patch findings without leaving the surface you’re already coding in. (source: @AISecurityInst, @sama, @_catwu)
Grok 4.3: 60% Cheaper Output, 321-Point ELO Jump on Agentic Tasks (7 mentions)
xAI dropped Grok 4.3 and the headline is the price. Output is ~60% cheaper than Grok 4.20, input ~40% cheaper. The model itself is small (500M active params, MoE) but scores 53 on the Artificial Analysis Intelligence Index, with a 321-point ELO jump on the GDPval-AA agentic benchmark to 1500. It still trails GPT-5.5 by 276 ELO (~17% expected win rate head-to-head), but for teams running a lot of agent loops the cost drop matters more than the gap at this tier. Available on OpenRouter today. (source: @testingcatalog, @ArtificialAnlys, @OpenRouter)
Cursor SDK Opens Up the Runtime That Powers Cursor (4 mentions)
Cursor open-sourced the TypeScript SDK that lets developers build agents on the same runtime, harness, and models that power Cursor itself. Three starter projects are in the cookbook repo: a coding agent CLI, a prototyping tool, and an agent kanban board. Customers running it already include Rippling (Linear ticket to merge-ready PR), Notion, C3 AI, and Faire. This is the fastest path I’ve seen from “I want an automated coding agent” to one running in CI today. (source: @cursor_ai)
Codex Expands Beyond Developers, Adds WebSockets to Responses API (3 mentions)
OpenAI rolled new Codex onboarding flows for Finance, Data Science, and Marketing roles. Computer use is 20% faster. The bigger update for builders is WebSockets in the Responses API: keeps response state warm across tool calls, cutting end-to-end agentic loop latency by up to 40%. OpenAI also said Codex API revenue doubled in under seven days post-GPT-5.5 launch, calling it their strongest API growth event yet. If you have an agent loop calling multiple tools in sequence, the WebSocket upgrade is one config change. (source: @embirico, @OpenAIDevs, @OpenAI)
Mesa: Git-Style Versioned Filesystem for Agents (2 mentions)
Every team building serious agents hits the same problem: where do the actual files live? The conversation history is easy. The artifacts (a contract redlined overnight, an audit report, a half-finished migration) usually live in an ephemeral sandbox that dies in 30 minutes or an S3 bucket where concurrent writes clobber each other silently. Mesa is a POSIX-compatible filesystem with Git-style versioning underneath: branches, diffs, rollbacks, ACLs, full history. Mount via FUSE or the TS SDK. Private beta open with legal, healthcare, and GTM design partners. Boring infrastructure that unlocks a class of production agents that don’t work without it. (source: @olvrgln)
Stripe Link for Agents Adds a Payment Layer for AI (2 mentions)
Stripe Link for Agents lets agents spend money on a user’s behalf without ever seeing the underlying payment credentials. The user approves each purchase. Most agent workflows today stop at the point real money has to change hands. That’s the constraint Stripe just removed. One founder framed the next wave as “SaaS v2”: services designed for agents to consume, not humans. Probably right. (source: @stripe, @_MaxBlade)
NVIDIA Nemotron 3 Nano Omni: Open-Weight 30B Multimodal (2 mentions)
NVIDIA released Nemotron 3 Nano Omni, a 30B-parameter multimodal MoE that handles text, image, video, and audio in a single model. 256k context window, claimed up to 9x faster than comparable systems. Open-weight matters for privacy-conscious enterprise deployments where audio and documents can’t leave the building. Available on Hugging Face and OpenRouter. The speed claim is the part to verify on your own workload. (source: @minchoi)
Honorable Mentions
- Anthropic “Jupiter” red-team is being run ahead of a potential launch around May 6. TestingCatalog spotted “claude-jupiter-v1-p” being put through evaluation; no Anthropic confirmation yet. Watch this week. (source: @testingcatalog)
- Apple’s Support app v5.13 shipped Claude.md files by accident, then patched them out within hours in v5.13.1. Confirms what was already obvious: Apple is using Claude Code in production at meaningful scale. (source: @CodeByNZ, @aaronp613)
- Gemini Flash got quietly upgraded on LM Arena to perform two tiers higher than the version that originally launched under the name. Vertex customers also got emails about transitioning Gemini 2 Flash to Gemini 3.1 Flash Lite. If you benchmarked Flash a month ago and decided no, retest. (source: @marmaduke091, @testingcatalog)
- Poolside Laguna M.1 dropped on OpenRouter, free for now. 225B MoE / 23B active, built from scratch for agentic coding. Worth a benchmark on your hardest long-horizon task while it’s free. (source: @OpenRouter)
- Google COSMO, an experimental Android AI agent app, briefly appeared on the Play Store and disappeared. Visible features: local Gemini Nano, screen awareness, voice match, recall, browser agent, deep research. Android is being rebuilt as an agent OS in the open. (source: @9to5Google)
- Grok Imagine Agent Mode launched as an infinite canvas where you brainstorm, write, generate images, edit them, and turn the best ones into video without switching tools. Desktop-first. (source: @imagine)
- xAI Voice Cloning is live in the xAI Console with custom voice creation in under two minutes plus 80+ prebuilt voices across 28 languages. ElevenLabs no longer has voice cloning to itself. (source: @xai)
- Ramp procurement agents ship for all 50,000+ Ramp customers: source vendors, review contracts, run approvals, negotiate, handle renewals. Early customer numbers cited: 16% annual savings on vendor spend, 46 hours/month saved. Ramp’s CEO: “the loud AI story is models replacing creative work; the quiet one is the drudgery of the back office evaporating.” (source: @tryramp)
- HBR on “psychological debt” identifies six negative effects of AI use at work: cognitive offloading, reduced autonomy, diminished competence, weakened social connection, credibility loss, identity threat. Their 1,200-employee study found higher psychological debt strongly correlated with lower AI usage even when employees acknowledged AI’s value. Early-career workers were most affected. (source: @HarvardBiz)
- NotebookLM is getting Mind Map customization and Google Play Books as a source. Build maps for specific topics; add full-length books to a notebook for AI analysis. The Play Books integration is the bigger one if you read on Google Play. (source: @testingcatalog)
Try This Weekend
For everyone:
- Self-host DocuSeal on a $5 cloud server. One Docker command, you get fillable PDFs, multi-signer flows, audit trails, and full API access. The median DocuSign enterprise contract is $17,250/year per Vendr; this is one of those “why does this still cost that much” stories that keeps happening as open source catches up.
- Ask Gemini for an Excel file, not a summary. “Create a monthly budget spreadsheet as an .xlsx file.” See whether it replaces a workflow you currently do by hand.
- Install Notchprompt and record one Zoom call or demo with it. The script lives in the MacBook notch right next to the camera, so your eyes stay on the lens. The notch was otherwise dead space.
- Run the bad-MS-Paint prompt on a photo of yourself or a friend. Ask the model to redraw it like a kid drew it in MS Paint with a mouse. The viral prompt of the week.
For developers:
- Fork the Cursor SDK cookbook and pick the CLI agent starter. Wire it to a GitHub Actions trigger to auto-close low-priority issues. Under an hour to a working pipeline.
- Drop Grok 4.3 into one of your existing agentic workflows via OpenRouter. Compare cost and quality side-by-side with whatever you’re running on. The 60% output cost reduction is real.
- Run Claude Security on a repo you’ve been meaning to audit. Same surface as Claude Code, no separate install.
- Add WebSockets to your OpenAI Responses API loop. One config change, up to 40% latency drop on multi-tool agent loops.
