top of page
Media (14)_edited.jpg

THE CONTROL ROOM

Where strategic experience meets the future of innovation.

Is Your GPU Debt a Stranded Asset? Inside NVIDIA's $20B Groq Pivot

  • Writer: Tony Grayson
    Tony Grayson
  • Dec 26, 2025
  • 18 min read

Updated: 4 days ago


By Tony Grayson, President & GM of Northstar Enterprise + Defense | Built & Exited Top 10 Modular Data Center Company | Top 10 Data Center Influencer | Former SVP Oracle, AWS & Meta | U.S. Navy Nuclear Submarine Commander | Stockdale Award Recipient


Published: December 26, 2025 | Updated: January 5, 2026 | Last Verified: January 5, 2026



TL;DR: 

"NVIDIA's $20B licensing deal with Groq signals a pivot from AI training to inference economics. By securing Groq's LPU technology—capable of 300-750+ tokens/second with sub-second latency—NVIDIA is addressing the "Agentic Multiplier": the 10x compute demand of autonomous AI agents. OpenAI's 30-gigawatt infrastructure bet and $1.4T commitment now face stranded asset risk as the GPU monoculture ends." - Tony Grayson, President and GM Northstar Enterprise and Defense


In 30 seconds: NVIDIA paid $20 billion to license Groq's inference technology because GPUs can't deliver the sub-second latency that AI agents need. Companies with GPU-backed debt—including OpenAI's $1.4 trillion infrastructure bet—now face stranded asset risk as the market shifts from training to inference.



 COMMANDER'S INTENT: THE SILICON CLIFF

The Mission: Identify the catastrophic "Financing Trap" where 5-year debt-funded infrastructure meets 18-month silicon obsolescence.

The Reality: NVIDIA’s $20B pivot to license Groq’s LPU architecture is an admission that the GPU monoculture is hitting a technical wall. As we move from training to Inference Economics, the "Agentic Multiplier" requires a speed and determinism that standard GPUs cannot deliver.

The Tactical Takeaway: Don't let your balance sheet become a graveyard of stranded assets. Operators must stop financing AI infrastructure on 5-year accounting cycles when the hardware value craters in 24 months. Pivot to architectures that solve for latency, not just TFLOPS.


Cinematic submarine-style control room dashboard displaying mission-critical AI metrics including the Agentic Multiplier, Inference Latency, and OpenAI's $1.4T infrastructure exposure.
The Tactical View of Inference Economics: A "Control Room" perspective on the shifting AI landscape. When multi-step autonomous agents demand sub-second response times, operators must look beyond aggregate TFLOPS toward deterministic performance and collateral value alignment.

DECEMBER 2025 MARKET SIGNAL

The $1.4T Exposure: As of December 2025, the infrastructure debt required to sustain the OpenAI-Microsoft roadmap is projected to reach $1.4 trillion. With NVIDIA now licensing Groq's LPU tech to address the "Inference Gap," the resale value of existing H100/B200 clusters faces a 60% "Stranded Asset" write-down risk as buyers pivot to deterministic silicon.



If you read LinkedIn, you know everyone is talking about this. But I have a different view of why it happened, and if you read my other posts on AI infrastructure, it makes sense.

On Christmas Eve, the headline was: "NVIDIA buys Groq for $20B."


That framing is wrong.


This isn't a typical acquisition. Groq will stay independent with Simon Edwards as its new CEO. NVIDIA now has a non-exclusive license to Groq's inference technology and has brought on key leaders, including founder Jonathan Ross and President Sunny Madra. GroqCloud, the enterprise AI inference infrastructure service, will continue to run independently. This setup helps avoid regulatory issues, since a full buyout could trigger antitrust concerns.


1. What are the Details of the $20B NVIDIA Groq Licensing Deal?


Known (reported):

  • This deal is about licensing and hiring, not merging. Groq will continue to run independently.

  • The deal is said to be worth about $20 billion, but the exact details haven't been shared.

  • Groq raised $640 million at a $2.8 billion valuation in August 2024, and reportedly reached $6.9 billion by early 2025.

  • GroqCloud will remain independent, and current customers will continue to get their service.


Inferred (the why):

  • NVIDIA paid a premium to keep Groq from becoming another company's main AI inference infrastructure provider.


This deal stands out because NVIDIA didn't buy Groq for its revenue, which was small. Instead, NVIDIA gained time, flexibility, and an advantage in the areas that matter for inference economics: response speed, time-to-first-token, the number of users it can serve at once, and cost efficiency. In short, NVIDIA invested in a lead on the practical factors that will drive AI profits as the industry moves toward inference.


2. Why Did NVIDIA Pay $20B for a Licensing Deal Instead of an Acquisition?


Groq isn't just a "pre-revenue science project." In February 2025, Saudi Arabia committed $1.5 billion to grow Groq's LPU infrastructure at LEAP 2025, with 19,000 LPUs set to be deployed in Dammam.


So why did NVIDIA make this deal?

The main risk was that Groq could set a new standard for real-time inference. If sub-second response times became the norm, GPUs might seem slow. Also, if a company like Google had bought Groq, it could have quickly built a better inference solution than today's GPU stack.

NVIDIA paid to keep that option out of its competitors' hands.


3. What Makes Groq's LPU Architecture Different from GPUs?


3.1 Deterministic AI Execution

Groq's compiler precomputes the execution graph, establishing a time-bound pipeline rather than a best-effort queue. This deterministic AI execution provides consistent response times rather than unpredictable ones. Relying on predictable performance is better than dealing with inconsistent results.


3.2 Supply Chain Independence

NVIDIA GPUs often run into supply shortages due to high demand for specialized parts. Groq uses a different manufacturing process that avoids these problems. This means Groq chips might be available when NVIDIA's are not.


3.3 Drop-in Distribution (GroqCloud)

GroqCloud supports open-source models with an OpenAI-compatible API and doesn't require CUDA. Users can run their own models and data, avoiding both NVIDIA's software lock-in and OpenAI's fees. This reduces vendor dependence, though it can be challenging for companies whose main advantage is their proprietary model.


3.4 MCP as Moat-Killer

Groq's Model Context Protocol, with Remote MCP support and prebuilt connectors, allows companies to switch from OpenAI to models such as Llama, Mixtral, or DeepSeek without rewriting code. As inference becomes more commoditized and APIs become more compatible, switching costs drop. This change makes open-source models true replacements, not just alternatives.


3.5 Energy Efficiency (Their Claim)

Groq claims "up to ~10x" the efficiency of GPUs. If that's accurate, it's another reason we're not actually in the power crisis that often makes headlines. (More on AI infrastructure and power in my analysis of data center economics.)


4. Why Is SRAM Critical for the Future of AI Inference Economics?


Training is more forgiving because you can batch work and let the hardware process it. Interactive inference isn't like that. Every millisecond of unpredictability means a user is waiting. That's why NVIDIA would license a rival architecture rather than just ship more GPUs.

LPU vs GPU Benchmarks 2025: The Inference Gap

Feature

NVIDIA GPU (HBM)

Groq LPU (SRAM)

Impact on Agents

Architecture

Non-Deterministic (Batch)

Deterministic (Pipeline)

Reliability of tool calls

Latency

Variable (Tail Latency)

Consistent (Zero Jitter)

Critical for voice/real-time

Memory Type

External HBM (bottleneck)

On-chip SRAM (no contention)

Eliminates memory stalls

TTFT

0.5-2.0+ seconds

~0.22 seconds

4-9x faster first response

Tokens/Second

50-150 (low batch)

300-750+

Faster agent loop completion

Best Use Case

Massive Model Training

Interactive Agent Loops

The Agentic Multiplier

The difference comes down to SRAM-based AI chips versus external memory architectures. GPUs pull model weights from high-bandwidth memory (HBM), which creates contention and variable latency. Groq stores weights on-chip in SRAM, eliminating the memory bottleneck that causes tail latency in interactive workloads.


In Groq's published benchmarks:

•        TTFT: ~0.22 seconds on supported models

•        Output speed: 300-750+ tokens/second per user at low batch

•        Power efficiency: Claimed significant advantage (measure at your workload)


GPU systems can perform well in some cases, but Groq's steady performance stands out when batch sizes are small, especially for variance and tail latency. This difference gets bigger when you stack multiple inference calls, as most platforms do.


Dual-pane visual showing the AI infrastructure financing trap chart alongside an LPU vs GPU benchmark table comparing deterministic execution and latency.
The Convergence of Financial and Technical Risk: This data set illustrates the "Stranded Asset Zone" where fixed five-year debt cycles clash with 18-month silicon obsolescence, alongside the technical benchmarks proving why SRAM-based LPUs are required to solve the Agentic Multiplier.

5. How Does the 'Agentic Multiplier' Change AI Infrastructure Demand?


A chatbot only needs one inference pass. An agent, however, runs in a loop: plan, retrieve, reason, call a tool, verify, and rewrite. That's 5 to 10 inference steps per request. If each step takes 2 seconds, you're waiting 20 seconds. That's not just slow—it's unusable. Would you want your smart car to make decisions that slowly?


This is why the difference between interactive and batch processing is important. GPUs are great for batch jobs. But agents, voice assistants, and copilots are interactive—they need small batch sizes and low latency. When someone talks about TPS, ask if it's for interactive or batch workloads. That answer will show which architecture is better.


What is the agentic multiplier in AI infrastructure? It's the compounding effect of latency across multi-step AI workflows. A 10x increase in inference calls per request means a 10x increase in the cost of latency. This is why the shift from chatbots to autonomous agents fundamentally changes the economics of AI compute.

Some major companies are on the wrong side of that answer. And to be clear, this isn't just an AI bubble argument.


6. Which Companies Face the Highest Stranded Asset Risk from the Inference Pivot?


Jensen Huang has called inference "one of the most important workloads in the next decade." The signal is clear. If you built for training, you might have bet wrong on the future.


6.1 OpenAI: The $1.4 Trillion Commitment

In October 2025, Sam Altman announced that OpenAI had committed to $1.4 trillion in infrastructure spending—30 gigawatts of compute capacity over eight years. The Stargate project alone is $500 billion through a multi-partner consortium including Oracle, SoftBank, and Microsoft.


HSBC projects OpenAI still won't be profitable by 2030, with a $207 billion funding shortfall even under bullish scenarios—what we might call the OpenAI infrastructure funding shortfall 2026 problem. OpenAI generated $4.3 billion in revenue during H1 2025 while burning $2.5 billion in cash.


All this infrastructure is built around dense GPU clusters. If NVIDIA's roadmap moves toward inference-optimized hybrid designs, OpenAI's huge spending could end up solving yesterday's problem.


6.2 Oracle: The Canary in the Coal Mine

Oracle's stock has dropped ~40% from its September 2025 high. The company has $248 billion in data center and cloud lease commitments over the next 15-19 years, raised $18 billion in new debt in September, and carries over $124 billion in total obligations.

On December 17, 2025, Blue Owl Capital withdrew from a $10 billion Michigan data center deal, citing concerns about Oracle's rising debt levels.


The market is showing that building AI infrastructure with heavy debt is approaching the limit of what people are willing to fund.


6.3 CoreWeave: High GPU Debt

CoreWeave has over $14 billion in total debt as of Q3 2025, with interest rates ranging from SOFR+4% to 11% across tranches, secured by GPUs.


CoreWeave's $22.4 billion in OpenAI contracts and $55.6 billion in revenue backlog look impressive—until you realize that if OpenAI's infrastructure has to shift toward inference-optimized silicon, those training-dense GPU clusters become expensive legacy assets. Microsoft CEO Satya Nadella is already hedging, saying the company is spacing out AI chip purchases to avoid getting "stuck with four or five years of depreciation on one generation."


6.4 The Hyperscalers: $500 Billion in Future Obligations

The four biggest AI infrastructure players—Google, Meta, Amazon, and Microsoft—have collectively issued over $100 billion in bonds in 2025 alone to fund data center buildouts. Bloomberg data shows major cloud players have accumulated roughly $500 billion in future obligations tied to data center leases.


Although these hyperscalers can absorb hits that would be fatal to smaller players, they're even recognizing the risk.


6.5 Fermi: The Canary That Just Died

Fermi America was supposed to be the future—an 11GW AI data center campus in Texas, co-founded by former Energy Secretary Rick Perry and valued at $15 billion after its October IPO.


On December 11, the anchor tenant—an investment-grade hyperscaler—terminated a $150 million construction funding agreement. Fermi's stock crashed 46% and is now trading 60% below its IPO price.


You could say the project isn't finished, but the market just changed its view on the "build it and they will come" idea for AI infrastructure. If a $15 billion company can lose half its value because one tenant leaves, the real question isn't if there's a bubble—it's who will be next.


6.6 The Pattern

All these companies made the same bet: that GPU-heavy infrastructure would stay at the heart of AI computing. They borrowed at high rates, signed long-term leases, and assumed demand would make it all worthwhile.


NVIDIA's deal with Groq shows the company doesn't think GPUs alone will win the inference race. This isn't just a small product update; it's a change in direction—and a clear signal about the NVIDIA Rubin roadmap for 2026.


7. Why Does the AI Infrastructure Financing Math No Longer Work?


Key Takeaway: The AI infrastructure financing trap occurs when 5-year debt terms collide with 18-month silicon cycles. GPU rental rates are down 50-70%, collateral values are falling faster than loan payments, and NVIDIA's Groq deal signals the GPU monoculture is ending.


7.1 The Financing Trap

Financing for AI infrastructure through SPVs and off-balance-sheet deals has grown rapidly. Big Wall Street firms have lent billions to new cloud companies, using NVIDIA chips as collateral. They're treating GPUs like real estate. Real estate doesn't become obsolete in 18 months.


The assumptions for that frame of thinking are wrong:

  • Silicon cadence: NVIDIA releases new architectures every 18-24 months and announces the next one even sooner, cratering your resale value before the chip arrives.

  • The depreciation game: Chip depreciation is hidden inside the building's amortization schedule. If you book 5-6 years of depreciation on chips that only last 2-3 years, you're overstating your earnings while your main asset loses value.

  • Collateral is losing value: GPU rental rates are down 50-70%. When chip values drop faster than loan payments, margin calls happen.


By licensing Groq's LPU technology, NVIDIA is showing that future "AI factories" will use inference-optimized chips along with GPUs. This is good for the market, but tough for anyone who borrowed heavily to build GPU-focused infrastructure. This is the core of stranded-asset risk that AI investors should be watching.


7.2 The Training Cost Collapse


The "moat" protecting top AI models is shrinking faster than most investors think:


As the industry shifts from "building the brain" to "running the brain," specialized inference chips stop being a curiosity and become the main focus. This is the shift in inference economics that's reshaping the industry.


This could still be a financing bubble, even if inference makes money, because margins shrink fast. The asset you're paying for is losing value even as you continue making payments.


8. What Are the Tradeoffs and Limitations of Groq's Architecture?


Before you rush to bet against every GPU stock, let's be honest about Groq's limitations too.


NVIDIA hasn't been asleep at the wheel on inference. Dynamo, TensorRT, the whole serving stack—they've been positioning inference as "extreme computing" for two years. The Groq move is acceleration and hedging, not panic.


And Groq's architecture does have real-world constraints:

  • Chip count scales with model size. SRAM is expensive. Big models need many chips. Your bottleneck moves to rack-scale interconnect and capital. This is the key constraint of SRAM-based AI chips.

  • Static scheduling needs a mature compiler. You're relying on Groq's toolchain, which is a dependency you may not want.

  • Longer context windows change things. Contexts with over 1 million tokens are now possible, and filling these long contexts actually favors high-bandwidth memory (HBM). When you process 128K+ tokens, sharding and interconnect become key, which can shift the advantage back to GPUs.


This isn't about Groq winning and GPUs losing. It's NVIDIA recognizing that interactive inference needs a different approach than training, and they want flexibility for both. You should, too.


9. What Does Stranded Asset Risk Mean for GPU Infrastructure?


Not all inference is the same, and the best hardware depends on your use case:

  • Frontier: Giant models, giant context. GPUs dominate. This is OpenAI/Anthropic territory.

  • Mainstream: Cost is the main concern. Specialized inference chips target the "GPU tax."

  • Distributed: Locality and tail latency are as important as FLOPS. This matters for edge, telecom, and defense.


Groq isn't the only non-GPU path. AWS Inferentia/Trainium, Google TPUs, AMD MI300, and Cerebras are all fragmenting the inference stack (Reuters). The thesis isn't "Groq wins"—it's that the GPU monoculture is ending, and NVIDIA's move makes that inevitable rather than speculative.


High-density setups aren't wrong; they are just not enough anymore.


So if high-density GPU farms aren't the only answer, where should the money go?


10. Where Is AI Infrastructure Investment Heading in 2026?


10.1 Edge and Distributed Inference

When latency is the product, proximity wins. Inference at the edge becomes a competitive advantage because it's closer to the users and data. The players with power-ready sites and network proximity are suddenly sitting on strategic assets.


10.2 Telcos

Telecom companies have what hyperscalers don't: distributed infrastructure, existing power contracts, and fiber everywhere. If inference moves out of centralized data centers, telcos become AI infrastructure plays, not just pipe providers. It also moves them out of the commodity business model. (I've written more about telco transformation opportunities.)


10.3 Defense and Secure Facilities

Governments want inference that doesn't cross borders. Secure, air-gapped, domestically controlled compute is a growth market. The players who can build hardened facilities fast will capture defense budgets that are already pivoting toward AI.


10.4 Modular Infrastructure (The Winning Posture)

The winning posture isn't "bet everything on one architecture." It's modular data center design that can swap chips without rebuilding the shell. Power, cooling, and networking that work for GPUs today and LPUs tomorrow. Behind-the-meter power generation that doesn't depend on grid interconnect timelines. The ultimate optionality is infrastructure that doesn't care what silicon you plug into it—and that you can build quickly to maintain optionality.


"The money is moving to inference, and the infrastructure winners

will be measured in tokens, not teraflops."


— Tony Grayson, former SVP Oracle, AWS, Meta


11. Control Room Operator Checklist


Three Questions You Need to Ask:

  1. What happens to our collateral value if NVIDIA's roadmap shifts? Because it just did.

  2. Do our debt covenants allow collateral swaps? If GPU-only lease rates keep falling, can we substitute?

  3. What is our debt-to-depreciation alignment? If loan terms exceed the silicon's realistic useful life, you have a stranded asset problem.


Operational Flexibility:

  1. Can the facility support compute swaps (GPU → ASIC/LPU) without rebuilding the shell?

  2. What breaks first under load: memory bandwidth, network, or power delivery?

  3. Is our workload interactive (chat, agents) or processing (RAG, batch)? The answer determines optimal architecture.


Performance:

  1. Are we optimizing for TTFT or just aggregate throughput?

  2. What's our TPS/user at Batch~1, and what are p95/p99 latencies?

  3. What's the real energy per delivered token under our actual workload?

  4. What's the variance/jitter under mixed workloads?


12. What Should AI Infrastructure Investors Watch Next?


  1. Does Groq IP show up in NVIDIA's 2026 Rubin (R100) architecture? If it does, this pivot isn't temporary—it's permanent.

  2. Do vendors start publishing honest Batch~1 specs instead of hiding behind batching?

  3. Do GPU-backed debt structures face covenant triggers as collateral erodes?

  4. Does the neocloud sector see distressed asset sales as training demand commoditizes?

  5. Does OpenAI's $1.4T infrastructure bet get restructured as the inference pivot accelerates?


Key Takeaway: The AI infrastructure financing trap occurs when 5-year debt terms collide with 18-month silicon cycles. GPU rental rates are down 50-70%, collateral values are falling faster than loan payments, and NVIDIA's Groq deal signals the GPU monoculture is ending. If your loan terms exceed the silicon's realistic useful life, you have a stranded asset problem.


13. Frequently Asked Questions


What is the NVIDIA Groq deal?

NVIDIA paid approximately $20 billion for a non-exclusive license to Groq's inference technology and hired key leadership, including founder Jonathan Ross. This is not a traditional acquisition—Groq remains an independent company, and GroqCloud continues to operate separately.


Why is NVIDIA licensing Groq technology?

NVIDIA is licensing Groq's LPU technology because GPUs face fundamental limitations for interactive inference workloads. The licensing structure reduces antitrust risk compared to a full acquisition while allowing NVIDIA to integrate Groq's deterministic execution model into future architectures, such as Rubin (R100).


What's the difference between AI inference and training?

Training is the AI model-building process—expensive, periodic, and centralized. Inference is running the model to serve users—continuous, everywhere, and increasingly the larger market. By 2030, inference is projected to represent the majority of AI compute spending.


What is Groq's LPU (Language Processing Unit)?

Groq's LPU is an SRAM-based AI chip designed specifically for inference. Unlike GPUs, which use external high-bandwidth memory (HBM), Groq uses on-chip SRAM to store model weights, eliminating memory bottlenecks and delivering consistent, predictable latency with zero jitter.


How does Groq compare to NVIDIA GPUs for inference?

Groq excels at interactive, low-batch inference with consistent sub-second response times (TTFT ~0.22s, 300-750+ tokens/second). GPUs perform better for batch processing and long-context workloads. See the LPU vs GPU Benchmarks 2025 table above for a detailed comparison.


What is the agentic multiplier in AI infrastructure?

The agentic multiplier is the compounding effect of latency in multi-step AI workflows. AI agents require 5-10 inference calls per request (plan, retrieve, reason, tool call, verify, rewrite). A 2-second delay per call means 20 seconds total wait time—unusable for real products. This multiplies the importance of low-latency inference for anyone building agent-based applications.


What is stranded asset risk for AI infrastructure?

Stranded asset risk refers to companies that borrowed heavily to build GPU-dense data centers, finding their assets depreciate faster than their debt schedules. NVIDIA releases new architectures every 18-24 months, but many operators use 5-6-year depreciation schedules. If the market shifts toward inference-optimized silicon, training-focused GPU clusters could become stranded assets.


Who is most exposed to the inference pivot?

OpenAI ($1.4T infrastructure commitment, 30GW capacity), Oracle ($248B in lease obligations), CoreWeave ($14B GPU-backed debt), and hyperscalers with $500B in future lease obligations. Fermi America's recent 46% stock crash after losing an anchor tenant illustrates the risk.


How fast are AI inference costs dropping?

According to Stanford's AI Index 2025, inference costs dropped 280-fold between November 2022 and October 2024. This rapid deflation in inference economics means the infrastructure you're financing today will generate significantly less revenue per token by the time you've paid it off.


Is this an AI infrastructure bubble?

This can be a financing bubble even if AI inference is genuinely valuable. The issue is the mismatch between debt structures (5-6 year terms) and asset depreciation (18-24 month silicon cycles). When GPU rental rates drop 50-70% and collateral values fall faster than loan schedules, margin calls follow.


What should infrastructure operators do now?

Ask three questions: (1) What happens to collateral value if NVIDIA's roadmap shifts? (2) Do debt covenants allow collateral swaps? (3) Is debt-to-depreciation alignment realistic? Build a modular data center design that can swap compute classes without rebuilding the shell.


Where is AI infrastructure investment heading?

Money is moving toward: (1) Edge and distributed inference, (2) Telcos with existing power and fiber infrastructure, (3) Defense and secure sovereign compute facilities, and (4) Modular infrastructure that can swap between GPU, ASIC, and LPU architectures. The winning posture is optionality, not commitment to a single architecture.


Who is Tony Grayson?

Tony Grayson is President & General Manager of Northstar Enterprise + Defense, a company building modular, AI-optimized data centers. He previously served as SVP of Physical Infrastructure at Oracle (managing a $1.3B budget and 1,000+ person team) and held senior executive roles at AWS and Meta. Tony is a former U.S. Navy nuclear submarine commander (USS Providence SSN-719) and recipient of the Vice Admiral James Bond Stockdale Award for Inspirational Leadership.


What is Tony Grayson's background?

Tony Grayson bridges two worlds: nuclear submarine operations and hyperscale technology infrastructure. He's a Naval Academy graduate who commanded the USS Providence before transitioning to Silicon Valley, where he led physical infrastructure at Oracle, AWS, and Meta. This combination of military systems thinking and Big Tech operations experience informs his analysis of AI infrastructure economics.


What does Tony Grayson write about?

Tony writes about AI infrastructure economics, data center strategy, and the intersection of military leadership principles with technology operations. His analysis focuses on inference economics, stranded asset risk, power constraints, and the shift from training to inference workloads.


What is The Control Room blog?

The Control Room is Tony Grayson's blog at tonygraysonvet.com, where he publishes analysis on AI infrastructure, data center economics, and technology strategy. The name reflects his background as a submarine commander—the control room is where operators make critical decisions with incomplete information under time pressure. This dynamic mirrors the challenges facing AI infrastructure investors today.


14. Conclusion

The point of this wasn't to show you that NVIDIA can't do inference.


The point was to show that inference will have its own platform battle, with its own winners, hardware shapes, and economic cliffs.


At GTC 2025, Jensen Huang said, "inference is going to be one of the most important workloads in the next decade." The Groq deal shows he meant it.


NVIDIA acted like a company that sees the next big challenge coming and refuses to be left out.


For operators who built for one world and are now living in another: the window to reposition is open, but it won't stay open forever. Modular infrastructure that can pivot between compute classes isn't a luxury; it's survival.


The money is moving to inference, and the infrastructure winners will be measured in tokens, not teraflops.


The operators who make it will be those who built for flexibility, not those who put everything on one architecture and hoped nothing would change.


Watch the 'Inference Gap' in action. This demo showcases the Groq LPU delivering 800+ tokens per second with deterministic, sub-second latency—proving why standard GPU architectures are hitting a wall in the face of the 'Agentic Multiplier.' Witness the future of interactive AI inference economics.

———


Related from The Control Room:

___________________________

Tony Grayson is a recognized Top 10 Data Center Influencer, a successful entrepreneur, and the President & General Manager of Northstar Enterprise + Defense.


A former U.S. Navy Submarine Commander and recipient of the prestigious VADM Stockdale Award, Tony is a leading authority on the convergence of nuclear energy, AI infrastructure, and national defense. His career is defined by building at scale: he led global infrastructure strategy as a Senior Vice President for AWSMeta, and Oracle before founding and selling a top-10 modular data center company.


Today, he leads strategy and execution for critical defense programs and AI infrastructure, building AI factories and cloud regions that survive contact with reality.


Read more at: tonygraysonvet.com


Sources:

Deal & Company Details:

  • CNBC: NVIDIA-Groq licensing + talent deal, ~$20B estimated (December 24, 2025)

  • Groq/PR Newswire: Saudi Arabia LEAP 2025: $1.5B Groq infrastructure commitment (February 2025)

  • TechCrunch: Groq funding $640M at ~$2.8B (August 2024)

  • TechCrunch: Groq $6.9B valuation (July 2025)


Financial Exposure Data:

  • Axios: OpenAI $1.4T infrastructure commitment, 30GW capacity (October 28, 2025)

  • Fortune/HSBC: OpenAI $207B funding shortfall projection (November 2025)

  • Yahoo Finance: Oracle $248B lease obligations, ~40% stock decline (December 2025)

  • CNBC: Blue Owl pulls out of Oracle $10B Michigan data center (December 17, 2025)

  • Fortune: CoreWeave $14B total debt, $55.6B revenue backlog (Q3 2025)

  • Bloomberg: Fermi 46% stock crash after tenant termination (December 12, 2025)

  • Bloomberg: ~$500B future lease obligations across major cloud players

  • Bank of America: ~$100B+ bonds issued by four major AI infrastructure players in 2025

  • CNBC: Microsoft CEO Nadella on spacing out chip purchases (November 2025)

  • Financial Times: GPU-collateralized lending structures and SPV financing


Training Cost & Efficiency:


NVIDIA Strategy:

  • NVIDIA GTC 2025: Jensen Huang on inference as "extreme computing," Blackwell/Dynamo (March 2025)

  • NVIDIA Rubin (R100) architecture roadmap for 2026


Groq Technical & Competitive:

  • Groq published benchmarks: TTFT, tokens/second (verify conditions at your workload)

  • VentureBeat/Forbes: GlobalFoundries manufacturing, HBM/CoWoS constraint avoidance

  • Reuters: Groq competitive positioning vs Cerebras

  • GroqCloud: Remote MCP support, OpenAI-compatible API


bottom of page