Training vs. Inference: The $300B AI Shift Everyone is Missing
- Tony Grayson
- Dec 13
- 8 min read
Updated: 4 days ago
By Tony Grayson, Tech Executive (ex-SVP Oracle, AWS, Meta) & Former Nuclear Submarine Commander

If you are only watching NVIDIA’s stock price, you are missing the actual story of the AI revolution.
The story of Training vs. Inference.
.
Right now, the market is obsessed with the "AI Bubble." Analysts are hyper-focused on the massive capital expenditures (CapEx) required to build foundational models like GPT-5 or Claude 3.5. They see billions being poured into GPUs and massive data centers in remote locations, and they worry about margins.
They are right to worry, but for the wrong reason. They think the "Model" is the product. It isn't.
The real questions should focus on the ROI of Training vs. Inference.
The Model is a commodity. The future value and the sustainable revenue lies in the
Infrastructure of Inference.
Here is why the "Smartest Model" is a losing game, why open source is the real assassin, and why the infrastructure that powers the application of AI is the only safe bet.
The Great Commoditization: A Race to Zero
Marketing teams at the big labs want you to believe their Large Language Model (LLM) is a unique, defensible "brain" that will command high margins forever. The data proves otherwise.
The reality is brutal: Intelligence is being commoditized faster than any technology in history.
Just look at the "LLMflation" trend. The cost of inference is dropping by nearly 10x every year. In 2023, GPT-4 level intelligence was a premium service. By late 2025, models like DeepSeek-V3 and Llama 4 are delivering comparable performance at a fraction of the cost—sometimes as low as $0.27 per million tokens.
There is no "moat" in intelligence when a competitor can knock you off the leaderboard in a single financial quarter. DeepSeek proved this in January 2025 when they released a frontier-class model with reported training costs of just $5.6 million, less than 5% of what US competitors spent.
The Open Source Disruption
The biggest threat to the "closed model" business isn't another closed model, it's open source.
This is another Training vs. Inference difference.
Most serious companies aren't just renting an API wrapper from OpenAI anymore. They are taking open models, hosting them privately, and injecting their own proprietary data using the Model Context Protocol (MCP).
MCP allows an enterprise to use a "commodity" model (such as Llama) and make it smart by securely connecting it to its own private documents and databases. This shifts the entire economic value from the model to the enterprise data. If the model is cheap (or free), you're only paying for compute to run it.
The Economics of "Learning" vs. "Doing": Training vs. Inference
This brings us to the infrastructure crisis.
A good analogy of Training vs. Inference:
AI Training is like medical school. It takes years, costs a fortune (CapEx), and happens once.
AI Inference is the doctor practicing medicine. It happens billions of times a day, forever (OpEx).
As models become commoditized, the volume of Inference explodes. Every time an employee queries an internal Llama model via MCP or an agentic AI executes a trade, it constitutes inference.
According to recent projections, the AI inference market is expected to reach $250–$350 billion by 2030, growing at nearly 20% annually. A model is trained once, but it is queried infinitely.
This is the $300B Shift. We are moving from a "Training Economy" (building the brain) to an "Inference Economy" (using the brain).
The Infrastructure Mismatch: Why "Bit Barns" Won't Work
This is where the opportunity lies for investors.
We spent the last decade building massive, centralized "bit barns" in rural locations (like Iowa or Nevada) to support Training. These facilities rely on cheap power and air cooling. They are fine for training a model over six months because nobody cares about latency.
Inference is different.
If a bank in New York is running a private DeepSeek model to analyze market data in real-time, it cannot wait for light to travel to rural Oregon and back.
It needs Low Latency: It must be close to the user (The Edge).
It needs High Density: It involves hot, power-hungry chips running 24/7.
It needs Reality: it must operate in power-constrained grids such as NY/NJ, London, and Ashburn.
The current infrastructure cannot handle this. We need a new architecture: Distributed, High-Density Data Centers that bring the compute to the customer.
The Investor Outlook: Bet on the Road, Not the Car
The "Model War" is a race to the bottom. Margins for model creators will be crushed by competition and open source.
But what Infrastructure is required to run those commoditized models? That is the toll road. This is why Training vs. Inference is so important.
Whether a company uses OpenAI, Claude, or a private Llama instance, they all need the same thing: secure, power-dense, low-latency space to run that inference. The value accrues to the physical assets, specifically the power, the cooling, and the specialized real estate, not the algorithm.
The smart money is moving away from the "Hyperscale Training" hype and toward the Inference Utility. This is why we are building the next generation of infrastructure, because no matter which model wins the war, they all need a place to live.
Read More from The Control Room
The Reality of Small Modular Reactors Read how we plan to power these high-density inference sites in grid-constrained markets.
Is the AI Bubble Real? Why the "Bubble" is only in the software layer, while the physical layer is just getting started.
Video: The Data Center Real Estate Boom
Frequently Asked Questions: AI Training vs. Inference Infrastructure
What is the difference between AI training and inference?
AI training is a one-time, intensive process of building a model by feeding it massive datasets—like sending a model to university. AI inference is the ongoing operational work of running data through the trained model to generate predictions and outputs. According to NVIDIA, training is essentially a one-time CapEx-like cost, while inference is continuous OpEx. For most companies, inference accounts for 80-90% of total AI lifetime costs because every prompt generates tokens that incur computational costs.
How big is the AI inference market in 2025?
The global AI inference market is valued at approximately $106 billion in 2025 and is projected to reach $255 billion by 2030, with a 19.2% CAGR. This growth is driven by the surge in generative AI and large language models that require real-time inference. According to Grand View Research, North America holds the largest market share at 38%. At the same time, the Asia Pacific is the fastest-growing region due to investments in sovereign AI initiatives and hyperscale data centers.
Why do AI models appear to be commoditized?
The intelligence gap between proprietary and open-source models is vanishing rapidly. In 2023, GPT-4 was significantly ahead; today, open-source models like DeepSeek-V3 and Llama 3.1 can match proprietary models on 90% of benchmarks for a fraction of the cost. According to CNBC, DeepSeek-R1 was trained for just $5.6 million—a fraction of the $500+ million spent on comparable proprietary models. When competitors can replicate capabilities for pennies on the dollar, high SaaS margins become unsustainable.
What is the Model Context Protocol (MCP) and why does it matter?
MCP is an open standard that enables AI models to securely connect to internal data sources like Slack, Google Drive, and SQL databases. It allows enterprises to use low-cost "commodity" open-source models and make them "smart" by giving them access to proprietary data. As Fortune notes, this shifts the value proposition from the model itself (which is commoditizing) to the unique enterprise data and applications built on top of it.
How does open source AI impact data center investing?
Open source AI is highly bullish to infrastructure investment. Lower software costs lead to dramatically higher usage volumes. According to Stanford's 2025 AI Index Report cited by NVIDIA, inference costs dropped 280-fold between November 2022 and October 2024. When inference becomes cheaper, usage explodes—which means more applications running 24/7 inference workloads. As Karmel Capital analysis shows, the lower the software cost, the higher the demand for the underlying hardware and data center infrastructure.
Is the AI bubble real?
The "Training Bubble" may be real; we are potentially overbuilding massive centralized clusters to train frontier models that become rapidly depreciating assets as open-source alternatives emerge. However, according to MarketsandMarkets, the "Inference Market" is just beginning its growth phase, projected to reach $255 billion by 2030. The bubble narrative misses this critical distinction: training is a one-time cost while inference is continuous and scales with every user interaction.
Why can't inference run in the same data centers used for training?
Physics and latency constraints separate training and inference infrastructure. Training centers are "Bit Barns" built in rural areas where power is cheap but network latency is high. According to Netrality's edge computing guide, coast-to-coast latency runs 20-25 milliseconds, while edge locations can deliver 5-7ms. Equinix research shows inference centers need proximity to users—edge locations delivering sub-10ms latency for real-time applications like autonomous vehicles and defense systems.
What is the realistic timeline for SMRs to power data centers?
Despite marketing promises of 2028-2030, realistic timelines for commercial SMR deployment at hyperscale are mid-2030s (2035+). According to WWT's SMR analysis, key bottlenecks include HALEU fuel supply constraints, first-of-a-kind construction challenges, and NRC approval processes. Google's Kairos deal targets the first reactor by 2030 with full 500MW deployment by 2035. Defense and sovereign clouds may adopt micro-reactors sooner, given national security priorities and distinct regulatory pathways.
How much do AI inference costs compare to training costs?
Inference often dwarfs training costs over a model's lifetime. According to AI compute analysis, GPT-4's inference bill is projected at $2.3 billion in 2024—approximately 15x its training cost. A chatbot with just 1,000 daily users can generate $13K-$40K monthly API bills. Gartner warns that companies scaling AI face cost estimation errors of 500-1000%. While per-unit inference costs are dropping 100x every two years, total inference spending is rising due to 31x growth in AI workload volume.
What is edge AI, and why is it critical for inference?
Edge AI moves processing closer to where data originates, reducing latency from milliseconds to microseconds. According to industry research, by the early 2030s, 74% of enterprise data will be created and processed at the edge. CBRE reports that edge data centers (5-10MW regional hubs) are redefining the "edge" as a new tier of infrastructure delivering sub-10ms latency. Major retailers have deployed edge AI at 1,000+ stores, cutting inference latency from hundreds of milliseconds to under 15ms while reducing cloud bandwidth costs by 82%.
How are hyperscalers responding to AI infrastructure demands?
Tech giants are making massive infrastructure bets: Meta plans $72 billion on AI infrastructure in 2025, Microsoft $80 billion, Amazon $100 billion, and Alphabet $75 billion. According to Data Center Frontier, they're pursuing hybrid architectures—centralized campuses for training and distributed edge networks for inference. Google signed the first corporate SMR deal with Kairos Power for 500MW. Amazon invested $500 million in X-energy for 5GW of SMR capacity by 2039. JLL reports that power availability has become "the new real estate" with 74% of data center capacity under construction already pre-leased.
What does 'follow the megawatts' mean for AI infrastructure analysis?
Following megawatts reveals actual AI spending patterns that software metrics miss. AI data centers require 200+ megawatts versus 30-50MW for traditional facilities. According to the IEA projections, global data center electricity consumption will reach 945 TWh by 2030—equivalent to Japan's entire consumption. U.S. data center power demand will surge from 4% to 9-12% of total consumption. When open-source AI shows $0 on SaaS charts but drives massive power consumption growth, tracking megawatt demand exposes where capital actually flows in the AI economy.
____________________________________
Tony Grayson is a recognized Top 10 Data Center Influencer, a successful entrepreneur, and the President & General Manager of Northstar Enterprise + Defense.
A former U.S. Navy Submarine Commander and recipient of the prestigious VADM Stockdale Award, Tony is a leading authority on the convergence of nuclear energy, AI infrastructure, and national defense. His career is defined by building at scale: he led global infrastructure strategy as a Senior Vice President for AWS, Meta, and Oracle before founding and selling a top-10 modular data center company.
Today, he leads strategy and execution for critical defense programs and AI infrastructure, building AI factories and cloud regions that survive contact with reality.
