How to Stop Shipping Low-Quality RL Environments (with Examples) Your broken harness is actively making the model worse. Here's what I keep seeing after years of eyeballing trajectories, and what you need to fix. Latent Space · 4天前
Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs We talk with the VendingBench authors on evaling Claudes from Haiku to Mythos, and how they build leading, and lasting, frontier evals from scratch. Latent Space · 5天前
🔬Scaling Past Informal AI - Carina Hong, Axiom Math Verified Generation and Compounding Intelligence Latent Space · 6天前
⚡️Satya Nadella: No Priors x Latent Space Crossover Special at Microsoft Build The legendary Microsoft CEO makes his first Latent Space appearance! Latent Space · 6天前
[AINews] Microsoft Build: MAI-Thinking-1 and MAI Family models Microsoft Build recap, and new MAI model technical details Latent Space · 6天前
GitHub's plan for Agents — Kyle Daigle, GitHub GitHub pioneered the modern AI coding era with Copilot, and the resulting explosion in agentic coding has led to notable strains on the most popular developer platform in the world. Here's the plan. Latent Space · 7天前
[AINews] NVIDIA Cosmos 3, Nemotron 3 Ultra, and RTX Spark Jensen scores a huge win. Latent Space · 7天前
Why Video Agent models are next — Ethan He, xAI Grok Imagine Inside xAI: Building Grok Imagine in 3 Months, Videogen vs World Models, and why Grok Imagine is so underrated. For the first time, we do a deep dive with the guy who led it! Latent Space · 8天前
[AINews] Founders and Forward Deployed Engineers a quiet day lets us highlight the new AIE WF focuses Latent Space · 10天前
[AINews] Anthropic raises $965B Series H, releases Opus 4.8 and Dynamic Workflows/ultracode Total Anthropic victory! Latent Space · 11天前
The Age of Async Agents — Cognition's Walden Yan & OpenInspect's Cole Murray 80% Devin Commits, Spec-to-PR Workflows, Full VMs, Agent Memory, and PMs Shipping Code Latent Space · 12天前