AI Infrastructure Lab Watch

The Inference Economy Crystallizes: Three Moves in One Week

In a single week, three companies attacked inference cost from fundamentally different angles. NVIDIA unveiled the Groq 3 LPU at GTC, the first product from its $20B Groq acquisition, delivering 35x higher inference throughput per megawatt alongside the Vera Rubin platform with seven new chips in production. Jensen Huang projected $1 trillion in Blackwell and Vera Rubin orders through 2027. OpenAI released GPT-5.4 mini ($0.75/1M input tokens) and nano ($0.20/1M input tokens), explicitly optimized for subagent orchestration and coding workloads. And Meta revealed four generations of custom MTIA inference chips already in various stages of production, with new generations shipping every six months, double the industry cadence.

These three companies are attacking the same problem from different architectural levels. NVIDIA is building dedicated inference hardware (a concession that training GPUs aren't optimal for inference). OpenAI is building purpose-built small models that make cheap, fast API calls good enough for the millions of subagent calls agentic systems generate. Meta is co-designing custom silicon with its own workloads, cutting out the GPU vendor entirely for classification and detection tasks. The combinatorial effect is multiplicative, not additive. Our March 13 prediction of 5-10x inference cost reduction in 12 months now looks conservative. A dedicated hardware improvement stacked on top of algorithmic optimizations and pricing model changes puts 10-20x reduction in range. The inference economy didn't arrive gradually. It arrived in a week.

AI Agents Production Reality

77% of Enterprise Agent Projects Failing to Scale

CIO, Deloitte, McKinsey, and SAS all published enterprise AI reports this week, and the findings converge uncomfortably. Only 23% of enterprises have scaled AI agents beyond a single business function. SAS explicitly called it "The Great AI Reality Check of 2026." The Deloitte report found that most enterprise agent deployments stall at pilot because they lack the organizational infrastructure to support autonomous decision-making at scale. The gap between agent capability in demos and agent performance in production remains wide.

The agents aren't failing because the technology is bad. They're failing because enterprises are bolting agents onto workflows designed for humans in 2019. Nobody is redesigning the actual work. I've seen this pattern across dozens of deployments: teams take an existing SOP, add an AI step, and wonder why it doesn't scale. The companies that will make agents work are the ones treating them as a new category of worker that needs purpose-built processes, not the ones automating existing procedures. This is the same pattern we saw in early cloud adoption. Lift-and-shift failed; cloud-native won. The 77% failure rate isn't a technology verdict. It's an organizational design verdict. And it sits in direct tension with the $1 trillion hardware bet happening simultaneously.

AI & Policy

Global Regulatory Paralysis: Three Jurisdictions, One Week

In a single week, three of the world's most influential regulatory bodies all effectively admitted they can't keep pace with AI. The UK abandoned its preferred AI copyright training exception after 11,500 consultation responses overwhelmingly opposed it, with no replacement policy and four options under consideration with no timeline. The US Senate released the TRUMP AMERICA AI Act on March 18, a federal preemption bill that expressly carves out child safety, state procurement, and data center infrastructure. The EU's AI Act high-risk rules, which we covered on March 13, remain delayed to 2027-2028. Three data points across three continents in the same week.

This isn't regulatory caution. It's a structural problem. AI capabilities are evolving faster than regulators can define categories to regulate. The US bill's carve-outs are the tell: by exempting child safety and state procurement, it creates a two-tier regulatory system that reintroduces the patchwork it claims to eliminate. Builders shipping products that touch both tiers, think education or healthcare, face the same fragmented compliance problem the bill was supposed to solve. The bipartisan opposition from governors (DeSantis and Newsom agreeing on something is a signal) suggests the real fight is federal vs. state power, not left vs. right. For builders, the paradox: regulatory paralysis benefits large companies with legal teams who can navigate ambiguity and punishes smaller teams who need clear rules to plan product roadmaps. Three data points make a pattern, and we're naming this one.

AI & Dev Tools Production Reality

AI-as-Auditor Becomes a Measurable Category

Google open-sourced Sashiko, an agentic code review system that monitors Linux kernel mailing list submissions and evaluates proposed changes across architecture, security, concurrency, and resource management dimensions. Testing against the last 1,000 upstream kernel issues, Sashiko found 53% of bugs. Every single one of those bugs had already passed through the Linux kernel's human review process, widely considered the most rigorous code review in all of software engineering. Sashiko is now running on all submissions to the kernel mailing list with Google funding.

53% sounds modest until you register the qualifier: 100% of these bugs passed through human review. This isn't replacement. It's a genuinely new capability. Two weeks ago, Anthropic found 22 vulnerabilities in Firefox that human reviewers had missed. Now Sashiko catches half the bugs that slip through the kernel's review process. Two different AI systems, two of the most carefully reviewed codebases on earth, same result. AI-as-security-auditor is now two-for-two on major codebases, which makes it a pattern, not an anomaly. The implication for production teams is direct: if AI catches bugs that the Linux kernel's reviewers miss, it is absolutely catching bugs in your codebase. Treat AI review as a parallel track alongside human review. You run both unit tests and integration tests. Same logic applies here.

On the Radar

AMI Labs Raises $1B for the Anti-LLM Thesis
Yann LeCun left Meta to raise Europe's largest-ever seed round ($1.03B at $3.5B pre-money) for world models built on JEPA architecture. The investor list tells the story: Bezos, NVIDIA, Samsung, Temasek, Eric Schmidt. Everyone backing AMI has a physical-world AI problem that language models don't solve. A billion-dollar bet that the next paradigm isn't bigger LLMs.
TechCrunch
Replit Triples to $9B as Vibe Coding Becomes a Category
Replit raised $400M at $9B, tripling its valuation in six months. The new Agent 4 introduces a visual canvas where users draw features and tweak mockups alongside AI agents. When coding becomes sketching, the addressable market is every knowledge worker, not just developers. Releasing the same week as OpenAI's subagent pricing shows the stack bifurcating: OpenAI pricing for developers, Replit pricing for everyone else.
TechCrunch
Meta Replaces Human Moderators with AI, Doubles Detection Rates
Meta's AI content enforcement systems now detect 2x more violations than human moderators while cutting errors by 60%. Catching roughly 5,000 credential-theft attempts daily that evaded humans. The clearest real-world validation of Anthropic's 'observed exposure' labor research. Combined with MTIA chips, Meta is vertically integrating AI moderation on custom silicon.
TechCrunch
OpenAI GPT-5.4 Mini and Nano: Subagent Economy Gets Its Currency
GPT-5.4 mini ($0.75/1M input) and nano ($0.20/1M input) are built for the millions of cheap, fast calls agentic systems make. Mini outperforms GPT-5 mini at similar latency while running 2x faster. At $0.20/1M tokens for nano, the economics of running a 50-step agent workflow just collapsed by an order of magnitude. OpenAI is attacking its own $225B inference cost projection.
OpenAI
Meta MTIA: Custom Silicon at Software Cadence
MTIA 300 in production, 400 tested, 450 and 500 targeting GenAI inference into 2027. A new chip generation every six months, double the industry cadence, achieved through modular reusable designs manufactured by TSMC. If Meta's Avocado model stays delayed, these chips may end up running Google's Gemini on Meta's own silicon.
Meta AI