Premium: Modular inference

Welcome to NVIDIA Week! It's time to catch up on the strategic moves from our favorite AI infrastructure provider. This will soon be followed by a Neocloud Week, to catch up on CoreWeave, Nebius, and IREN.

Now that we've looked at NVIDIA's stellar Q127, let's peek at what comes later this year with Vera Rubin. NVIDIA debuted this new era of AI systems at CES in January, greatly expanded it at GTC in March, and refined their agentic message at GTC Taipei this week. This will also be of interest to AI chip competitors, hyperscaler clouds, neoclouds, and AI providers like Anthropic and OpenAI.

This will be a multi-part series. This first post focuses on the strategic and financial impacts of NVIDIA's expansion of Vera Rubin into a modular AI factory platform. A second post will go deeper into the individual rack systems, and a third post on the agentic software stack and other major moves.

The rise of agentic AI is creating another scaling law that will not only drive GPU demand, but also CPU demand for agentic orchestration, tool use, service calls, data access, and running code generated by the AI agent.
NVIDIA initially announced the Vera Rubin line at CES in January, built around 6 new frontier chips designed in tandem across GPU, CPU, DPU, and 3 layers of networking.
They "acquired" Groq just 2 weeks before CES. While not part of the initial announcements, it soon became a major part of the roadmap by GTC as a 7th chip in the Vera Rubin mix.
Groq is not replacing GPUs, but rather, enhancing them in the stages of inference where its SRAM-based serialized architecture can lower latency to drastically improve per-user interactivity. They are positioning this as allowing inference providers to add premium-priced tiers for the lowest latency.
At GTC in March, Vera Rubin was then expanded into a complete line of 5 new rack systems powered by those 7 chips. Customers can now mix in these new modular racks to improve performance in different areas across disaggregated compute, networking, storage, and agentic orchestration, depending on their needs and use cases.
Vera Rubin sales are expected to start showing up in October (at the very end of their Q327), so will really start contributing heavily in Q427 and Q128.
These new rack types are being framed as incremental to that $1T in expected GPU sales through 2027. The CEO's rough expectations (Jensen Math_™) are for 25% uplift from Groq, 20% uplift from AI storage, and 5% uplift from Vera CPUs.
As for Vera going standalone, mgmt believes it will add $200B in TAM, and expects Vera systems to be $20B in FY27 (~5% of the mix).
Going forward, Rubin Ultra and Feynman will continue to be offered as Oberon-based NVL72 racks. This assures that past AI data centers can continue to upgrade existing facilities into these systems without a complete power & cooling overhaul, as long as they can deliver the per-rack power needed.

Now in Part 1:

A new scaling law (agentic AI) in inference
Groq "acquisition"
CES Announcements / initial look at Vera Rubin line
GTC Announcements / expansion of Vera Rubin line
Modular inference
Future iterations

Part 2:

Groq LPX rack
Silently dropping Rubin CPX plans
Vera CPU
Networking
DPU/BlueField-4/Storage
Software stacks
Roadmap changes

Part 3:

Agentic AI
AI buildouts
Ecosystem investments
Partnerships & NVLink Fusion
Other announcements of interest

Premium: Modular inference

muji

muji

Premium: Agentic waves

Premium: Agentic leverage

Premium: Farther out waves

Premium: Vera Rubin decoder ring

Premium: Rolling in the green