What is the AI compute crunch—and how will it affect chatbots?
May 1, 2026
5 min read
Add Us On GoogleAdd SciAm
What is the AI compute crunch—and how will it affect chatbots?
Rate limits on Claude and other tools could hint at a deeper squeeze on the chips, power and data centers needed to run advanced AI. Researcher Lennart Heim explains
By Deni Ellis Béchard edited by Eric Sullivan
Inside the data centre of French company OVHcloud in Roubaix, northern France on April 3, 2025.
SAMEER AL-DOUMY/AFP via Getty Images
In late March some of the heaviest users of Anthropic’s Claude large language models began posting screenshots of a strange new scarcity: they were reaching five-hour usage limits in 20 minutes. Complaints spread across Reddit, GitHub and X. Anthropic told subscribers that their sessions would burn through usage limits faster during peak hours. The company also blocked some third-party tools, including OpenClaw, from drawing on its flat-rate subscription limits. Several weeks earlier Boris Cherny, who leads Claude Code, said that a default setting for how the model thinks had been lowered.
Users immediately questioned why a paid AI tool was suddenly giving them less. Had the AI boom begun to outrun the machinery needed to sustain it?
The pressure is not limited to Anthropic. OpenAI has begun shuttering Sora, its video-generation platform, as the number of developers using its coding assistant Codex has surged to four million per week. Investors and developers are now talking about a “compute crunch,” the possibility that demand for AI is growing faster than companies can build data centers and power them.
On supporting science journalism
If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
The stakes are larger than causing frustration for developers. If AI becomes the everyday interface for coding, science, learning, medicine, customer service, defense planning and office work, then access to compute becomes access to economic speed. And limits are starting to show up in the products people use.
The numbers are already steep. In a July 2025 white paper, Anthropic projected that the U.S. AI sector will need at least 50 gigawatts of electric capacity by 2028 to maintain global AI leadership—roughly the output of 50 large nuclear reactors. The International Energy Agency projects that global data-center electricity use is on track to double by 2030.
Compute is not new. Every chat with Claude or GPT runs on the same underlying machinery that calculates spreadsheet totals and renders video games—silicon wafers etched with billions of microscopic switches, organized into specialized processors. Training a frontier model can require tens of thousands of these processors running for weeks or months. Once the model is trained, using it also consumes compute each time someone asks a question. That demand now reaches across the supply chain. On January 15 Taiwan Semiconductor Manufacturing Company (TSMC), which fabricates most of the world’s advanced AI chips, announced it would spend up to $56 billion this year alone to expand capacity. Customers are still asking for more.
AI policy expert Lennart Heim is a useful guide to this machinery. He formerly led compute research at the RAND Center on AI, Security, and Technology and cofounded Epoch AI, which tracks the resources behind frontier AI models. His beat is where a cloud dashboard becomes a construction project—where digital demand collides with factories, transformers, chips and cables.
[An edited transcript of the telephone interview follows.]
Developers are saying the rate limits and blocked third-party tools look like a compute crunch. What does a compute shortage actually mean?
When we say “compute,” we mean computing power. For AI, training compute scales with model size: bigger neural networks need more data, and more data needs more processing power. What was underreported for years is that the same relationship holds for deployment. Running the model for users—inference—is incredibly compute-intensive because bigger models need more computing power to serve. So if more people use AI with more tokens and more intensity, you need more compute. If 10 times more people use AI 10 times more heavily, you need close to 100 times more compute.
Why does a flat-rate subscription break down for AI in a way it didn’t for earlier Internet services?
The Internet runs on flat-rate subscriptions: you pay $20 a month and get effectively unlimited use. That works when the marginal cost per user is low—a Google Workspace power user doesn’t cost Google much more than a light user. With AI, it breaks. Using AI 10 times more heavily costs the provider roughly 10 times more money. Paying per token means you literally pay for your