For the past two years, the AI infrastructure race has centered largely on one metric: access to more GPUs.
But as hyperscalers, enterprises, and public-sector organizations move AI workloads into production, attention is shifting toward a harder question – how efficiently those systems operate.
At the NC Tech’s Tech Fest event this week in Durham, North Carolina, speakers from enterprise IT and infrastructure organizations discussed mounting pressure on power grids, growing cluster complexity, and the operational strain created by large-scale AI deployments.
“The amount of energy that we have on the grid versus the computational demand from these providers isn’t matching up,” said Vijay Ramanujam, chief information officer at the North Carolina Department of Health and Human Services. “Everybody’s asking how we can rewire the infrastructure in a way that makes it more efficient.”
The comments reflect a broader shift across the AI infrastructure market as operators confront the physical limits of large GPU clusters.
Bigger Clusters, Bigger Problems
Training and inference systems now span tens of thousands of GPUs, turning power delivery, cooling, networking, and workload coordination into major operational challenges.
Industry conversations that once focused almost entirely on GPU shortages now increasingly include utilization rates, cluster efficiency, and scheduling software.
The reason is simple: adding more GPUs does not automatically deliver proportional performance gains.
As AI clusters expand, communication overhead, workload imbalance, and networking latency can sharply reduce effective utilization across systems.
Ramanujam said many organizations still rely on brute force hardware expansion rather than improving how workloads move through GPU clusters.
“Only a few frontier labs have the expertise and the time to reinvent the wheel to make things more efficient,” he said.
Beyond FLOPS
The growing emphasis on efficiency is also reshaping how some operators evaluate AI infrastructure economics.
Rather than focusing solely on GPU counts or theoretical compute performance, organizations are increasingly measuring how much usable AI output systems can generate relative to power consumption.
“We are not just benchmarking by FLOPS anymore,” Ramanujam said. “People are asking how many tokens you can output per watt.”
That shift reflects broader concern about power availability as AI demand climbs and operators struggle to secure additional capacity.
Efficiency Moves Up the Stack
Software optimization and workload orchestration are becoming larger parts of AI infrastructure planning as operators look for ways to improve performance without continually expanding physical infrastructure.
Ramanujam said larger deployments increasingly expose inefficiencies tied to communication overhead, GPU utilization, networking latency, and power consumption.
The result is an AI infrastructure market beginning to focus less on raw GPU accumulation and more on how efficiently organizations convert power and hardware into usable AI output.
