X is the best way I’ve found to keep up with AI. I like tweets throughout the week, filtering for things I think are actually worth knowing. I use Claude Code to pull those likes automatically and help me turn them into this post (here’s how the pipeline works). This week: 150 tweets liked, filtered down to what’s below.

Check out the previous roundup (Apr 16) if you missed it.

AI for Everyone

GPT-5.5 Ships in ChatGPT (8+ mentions)

OpenAI launched GPT-5.5 on April 23. Same per-token latency as GPT-5.4, meaningfully smarter on knowledge work, research, and writing. OpenAI is calling it their smartest and most intuitive model yet, and the early reactions back the claim up. One team at Chai tested it on huge spreadsheets (100K to 1M+ cells) and called it the Pareto frontier for that work: best accuracy, fastest, most efficient. OpenAI also slipped a line into their announcement that I think will age well: “the last few years have been surprisingly slow.” They’re telling you the release pace is about to get a lot faster. (source: @OpenAI, @kimmonismus, @nicochristie)

Google Commits $40B to Anthropic

Bloomberg broke that Google is investing up to $40 billion in Anthropic: $10B in cash now at a $350B valuation, plus another $30B if Anthropic hits performance targets. Google Cloud will deliver 5 gigawatts of compute over five years. This lands days after Amazon committed up to $25B. Anthropic’s run-rate revenue passed $30B this month, up from about $9B at the end of 2025. If you build on Claude and have been worried whether the capacity story holds, this is your answer. (source: @WatcherGuru)

ChatGPT Images 2.0 Lands

OpenAI shipped ChatGPT Images 2.0 (the API model is gpt-image-2) and the social reaction was the loudest I’ve seen for an image model since Midjourney v6. The party trick everyone’s posting: hand it a photo of a house and it generates a plausible architectural floor plan. Text rendering is finally sharp enough for real marketing work, it can generate 360-degree equirectangular panoramas, and the API is on fal.ai day-zero. If you’ve been bouncing between Canva, Midjourney, and DALL-E for visuals, this is the new default. (source: @OpenAI, @deedydas, @minchoi)

Google Cloud Next: Deep Research, Workspace Intel, Photos Auto Frame (5+ mentions)

Google Cloud Next dumped a stack of consumer-relevant AI updates. Deep Research and Deep Research Max now ship via the Gemini API on Gemini 3.1 Pro with MCP tool support, so you can wire your own tools into long research runs. Workspace Intelligence finally stitches Docs, Sheets, meeting notes, and Gmail into one queryable context for agents. The sleeper hit: Google Photos Auto Frame re-composes photos from new angles after the fact, using ML depth estimation and inpainting. It’s live in the Photos app today. Google also announced TPU 8, a dual training-and-inference chip with 3x compute per pod. (source: @sundarpichai, @GoogleDeepMind, @ChanduThota)

Anthropic Runs Project Deal

Anthropic published Project Deal, a real-money marketplace they ran inside their San Francisco office where Claude bought, sold, and negotiated on colleagues’ behalf. It follows Project Vend (where Claude ran a small business and famously lost money). Project Deal is the agent-economy version: what happens when both buyer and seller have AI? Most of us are already living in a small version of this when Claude books an Uber and the driver’s app routes via its own algorithm. Anthropic is just instrumenting it. (source: @AnthropicAI)

xAI Grok Voice and Cheap Audio APIs (3 mentions)

xAI released Grok Voice Think Fast 1.0, claiming the top spot on Tau Voice Bench with a model built specifically for noisy real-world audio: accents, background sound, mid-sentence interruptions. They also launched Speech-to-Text and Text-to-Speech APIs at $0.10 per hour batch. ElevenLabs at comparable quality is roughly an order of magnitude more. xAI says the same stack already powers Tesla cars and Starlink support, which is a non-trivial robustness signal. If you ship anything with voice, run a side-by-side this weekend. Grok 4.3 beta also rolled out to SuperGrok and X Premium+ users. (source: @xai, @VaibhavSisinty, @MattDabit)

AI for Developers

Anthropic’s Head of Product for Claude Code, Cat Wu, told Lenny Rachitsky in a new podcast that their internal product timelines have collapsed from six months to one month, sometimes one week, sometimes one day. You can verify it from the outside: the developer-facing items below all shipped in the past seven days.

GPT-5.5 in Codex with 56% Fewer Tokens (8+ mentions)

The other half of the GPT-5.5 launch is what it does inside Codex. OpenAI claims state-of-the-art on SWE-Bench Pro (58.6) and Terminal-Bench 2.0 (82.7), with 73.1 on their internal Expert-SWE eval where tasks have a 20-hour median human completion time. The number that matters for your bill: Perplexity reported GPT-5.5 used 56% fewer tokens than GPT-5.4 on the same complex computer-use workflows. If you’re running Codex daily, swap the model picker today. (source: @OpenAIDevs, @firstadopter)

DeepSeek V4 Goes Open with 1M Context (4 mentions)

DeepSeek dropped V4 Preview overnight on April 24 and it’s the most consequential open-weights release since their R1 moment. Two variants: V4-Pro at 1.6T parameters with 49B active, and V4-Flash at 284B with 13B active. Both run a 1M token context window by default, no long-context tier upcharge. Apache 2.0. Pricing is the part that makes you sit up: V4-Pro is $3.48 per million output tokens, V4-Flash is $0.28. Anthropic’s Opus is $25 and OpenAI’s frontier is $30 at the same tier. DeepSeek also claims a new SWE-Verified record and a Codeforces record, though independent evals against GPT-5.5 and Opus 4.7 are still pending. Treat the benchmark superlatives as preliminary, but the price-performance shift is already real. (source: @deepseek_ai, @kimmonismus, @scaling01)

Qwen3.6-27B Beats Its Own 397B Predecessor (5 mentions)

Alibaba released Qwen3.6-27B on April 22 and the headline still feels weird to type: the 27B dense model outperforms Qwen3.5-397B-A17B (15x larger) on every coding benchmark. SWE-Bench Verified climbed to 77.2 (competitive with Claude Opus 4.5 at 80.9), Terminal-Bench 2.0 at 59.3. Multimodal natively, with thinking and instruct modes in the same checkpoint. The practical part: Unsloth’s 4-bit Dynamic GGUF runs on 18GB of RAM, which puts a frontier-competitive coding model on a regular MacBook Pro or a single RTX 4090. Kyle Hessling ran it through his full frontend design test suite and said he was “completely astounded.” (source: @Alibaba_Qwen, @UnslothAI, @lmstudio)

Bitwarden CLI Supply Chain Attack (3 mentions)

Between 5:57 and 7:30 PM ET on April 22, @bitwarden/[email protected] was malicious. Attackers compromised a Checkmarx GitHub Action used in Bitwarden’s CI pipeline, gained npm publish access, and shipped a version with a preinstall hook that ran a Bun-runtime credential stealer (bw1.js). It harvested .env files, SSH keys, GitHub and npm tokens, shell history, and AWS/Azure/GCP credentials, encrypted them with AES-256-GCM, and exfiltrated by creating public GitHub repos under the victim’s own account. Bitwarden’s statement confirmed no vault data was accessed. The attack is part of the broader Shai-Hulud Checkmarx campaign. If you ran npm install against Bitwarden CLI in that 90-minute window, rotate every secret on the machine and revoke every token. The lesson holds even if you weren’t hit: npm preinstall hooks have full filesystem access, and your CI secrets are the target. (source: @TheHackersNews, @TheHackersNews)

Claude Design Lands in Research Preview (3 mentions)

Anthropic Labs shipped Claude Design on April 17 and it’s been buried by the GPT-5.5 news. It reads your codebase and existing design files, extracts your team’s design system, then builds prototypes, slides, one-pagers, and design explorations through conversation. Exports to Canva, PDF, PPTX, or HTML, or hands the work to Claude Code for implementation. Powered by Opus 4.7 vision. Pro, Max, Team, and Enterprise. Open claude.ai/design and give it a real prototype brief from your project. The design-to-code handoff is the part that actually saves time. (source: @claudeai)

Claude Cowork Live Artifacts (3 mentions)

Live Artifacts in Claude Cowork sounds boring until you use it. Build a dashboard, report, or slide deck once, and every time you open the artifact it re-fetches data from your connectors. A sales dashboard built today stays current next month with no re-prompting. Saves to a new Live Artifacts tab with version history, accessible from any session. Felix Rieseberg, who works on the desktop app, called it a “simple idea, super useful in practice.” This pushes Cowork from “chat surface” toward “lightweight tool platform.” (source: @claudeai, @felixrieseberg)

Kimi K2.6 Ships an Open-Source Agent Swarm (2 mentions)

Moonshot AI released Kimi K2.6 and the architecture is the genuinely new piece. 1T parameters total, 32B active per token, modified-MIT license. The swarm: a single prompt can spawn 300 sub-agents running in parallel for up to 4,000 coordinated steps, up to 12 hours, producing real files (100-file repos, 100K-word literature reviews, 20K-row datasets) instead of chat output. SWE-Bench Pro at 58.6 (matches GPT-5.5), SWE-Bench Verified at 80.2. One reviewer called it GPT-5.4 level coding at 76% lower cost. The “300 agents for 12 hours” framing requires either Moonshot’s hosted API or serious self-hosted infrastructure. For long-horizon autonomous research or codebase migrations, it’s the new model to test. (source: @Kimi_Moonshot, @cryptopunk7213)

RTK Cuts Token Use 80-92% on Command Output (1 mention)

RTK is a Rust CLI proxy that sits between your terminal and your AI coding tool, compressing command output before it eats your context window. The reported numbers are large enough to be worth checking yourself: 80-92% reduction on git operations, 90% on test runners. A typical 30-minute Claude Code session drops from 118K tokens to 24K. Trevin Chow says one developer saved 400 million tokens in a week. Zero dependencies, single binary, works with Claude Code, Cursor, Copilot, and 9 other tools. The setup is under five minutes and rtk gain shows your cumulative savings. I’m installing it tonight. (source: @trevin)

Honorable Mentions

  • ChatGPT Workspace Agents lets ChatGPT Business and Enterprise customers build always-on agents from a Studio UI with templates like “Data Analyst” and “Chief of Staff,” running 24/7 with Skills, Connectors, and scheduled actions. (source: @OpenAI, @testingcatalog)
  • Microsoft Foundry Hosted Agents are per-session VM-isolated sandboxes with persistent filesystems, supporting Claude Agent SDK, LangGraph, and other frameworks. Pay-as-you-go, no idle cost. (source: @satyanadella)
  • OpenAI Privacy Filter open-sources a 1.5B-param (50M active) PII detection model under Apache 2.0. Runs locally or in browsers via WebGPU, catches 8 categories including secrets and private URLs. (source: @altryne)
  • Cloudflare Local Explorer in Wrangler beta gives you a UI to inspect KV, R2, D1, and Durable Objects during local development. Cloudflare also open-sourced Kumo, their internal accessible-by-default component library. (source: @opombosilva, @roerohan)
  • Google ReasoningBank is an agent memory framework that learns from failures, not just successes. +8.3% on WebArena, +4.6% on SWE-Bench Verified. (source: @GoogleResearch)
  • DSPy RLM lets agents handle near-infinite context by writing Python in a sandboxed REPL to load only the chunks they need. From MIT’s OASYS lab. (source: @samhogan)
  • Claude Code /ultrareview spins up a cloud fleet of bug-hunting agents on your branch. Pro and Max users get 3 free runs through May 5. (source: @ClaudeDevs)
  • Claude Managed Agents memory enters public beta with an “intelligence-optimized memory layer” so hosted agents stop forgetting your preferences between sessions. (source: @claudeai)
  • Claude Cowork third-party model support lets you point Cowork or Claude Code at OpenRouter or LiteLLM and run GPT-5.5, Grok 4.3, or local Gemma 4 alongside Claude. (source: @PawelHuryn)
  • Claude Uber integration booking flow lands inside Claude. Browse restaurants, check fares, see ETAs, place orders without switching apps. (source: @praveenTweets)
  • Google DeepMind agent attack-surface paper documents that websites can detect AI-agent visitors today and serve them adversarial content invisibly. If your agent browses, treat scraped content as adversarial. (source: @HowToAI_)

Try This Weekend

For everyone:

  1. Try GPT-5.5 in ChatGPT on the most complex recurring research or analysis task you have. The reasoning gains show up most when the task has multiple steps that have to stay coherent.
  2. Open Google Photos Auto Frame on a portrait that was framed slightly wrong. Watch the 3D re-composition fill in what the original lens couldn’t see.
  3. Generate a floor plan with ChatGPT Images 2.0 from a photo of a room or building. It’s the demo everyone is posting because it actually works.
  4. Test Grok Voice Think Fast on X Premium+ against your default voice assistant on a multi-step task. The accent and interruption robustness is the part to listen for.

For developers:

  1. Switch to GPT-5.5 in Codex for a workday and watch your token bill. The 56% reduction Perplexity reported is real on agentic workflows.
  2. Run Qwen3.6-27B locally via Unsloth’s 4-bit GGUF with 18GB of RAM. Put it head-to-head with whatever local model you’ve been using.
  3. Install RTK and run rtk gain after a coding session. The 80-92% compression on command output adds up faster than you’d think.
  4. Open Claude Design and prototype something from your current project. The design-system extraction from your existing codebase is the part that surprises you.
  5. Pull DeepSeek V4-Flash off Hugging Face and benchmark it on your real workload. At $0.28 per million output tokens with 1M context, the price-performance shift is the story to verify.