Baseten Grew 30x in One Year — Why Inference Cloud Is the Pick-and-Shovel Business of the AI Era

Source: No Priors | Published: 2026-05-01T15:07:25Z

Over 95% of tokens on Baseten come from custom model inference — almost no one runs vanilla open-source weights — and 99% of the enterprise market hasn't even deployed AI yet.


AI inference compute might be the last true "picks and shovels" business of this era. Baseten is an AI inference cloud company that grew 30x over the past year, with revenue projected to exceed $1 billion this year. Its founder Tuhin Srivastava recently shared his take on what's happening in this market — how tight supply really is, how open-source models are reshaping the landscape, and why inference might be the "endgame market" of the AI era.


The application layer won't get eaten by foundation model companies — but the moat is in workflows

On whether there's still room for an independent application layer, Tuhin's view is clear: yes, but not because models aren't capable enough — it's because user signals and workflows can't be easily captured by frontier model companies.

He pointed to Abridge, a medical AI company building ambient note-taking tools for doctors, deeply integrated into hospital EMR systems. The real value isn't transcription itself — it's clinicians' edits to notes and the multi-step follow-up workflows triggered inside the EMR after notes are generated. These workflow signals are completely inaccessible to frontier model companies. Companies that own these signals can use them for post-training, building long-term, domain-specific agent models.

Customer service follows the same logic. A single ticket often requires 10 to 20 actions to resolve — that multi-step process is the moat.

99% of the enterprise market hasn't even started

Here's a surprising number: measured by inference volume, 99% of the market is still driven by early-stage AI-native companies. Most enterprises haven't meaningfully begun AI adoption.

Tuhin says he was asked the same question two years ago, and the answer has been "insanely consistent" — large-scale enterprise adoption is still ahead. But things are shifting: enterprises are moving from "whether to use AI tools" to "using closed-source model APIs," and next comes the custom model phase. This entire transition is only on step two.

95% of inference tokens run on custom models

Over 95% of tokens on Baseten come from custom model inference — virtually no one is running vanilla open-source weights. Customers are either post-training on their own data to improve quality or applying compilation optimizations for performance.

The logic is straightforward: companies at meaningful scale have already proven product-market fit, possess unique user signals, and know what to optimize. Tuhin's advice is equally direct: don't post-train before you have product-market fit. Prove with the best closed-source models that you have something worth optimizing, then consider customization.

"No post-training pre-product-market fit."

Chinese open-source models: an indirect subsidy for the U.S.

On security concerns around Chinese models, Tuhin takes a pragmatic stance. He says he's never seen real evidence of backdoors or embedded agendas — early on, some models were found to have biases, but the community flagged them quickly. In network-isolated inference environments, data doesn't magically cross network boundaries.

What worries him more is the flip side: if China has five labs consistently producing high-quality open-source models and the U.S. can't stand up even one, that's the real problem. He quoted someone else's line — "We should forget this is a Chinese model, pretend Meta released it, and build on that assumption."

An interesting economic angle: the Chinese government is effectively subsidizing the R&D behind these open-source models, and the dividends of that investment are being captured by U.S. companies adopting them. This amounts to the Chinese government indirectly subsidizing AI adoption at American enterprises.

What DeepSeek's cost advantage actually means

Tuhin offered a concrete comparison: running DeepSeek in production costs roughly 20% of what it costs to run OpenAI or Anthropic models, with comparable or better latency and potentially higher reliability. If the U.S. can't access this level of intelligence in this form, innovation velocity takes a real hit — because cheaper intelligence directly translates to intelligence being embedded in more places.

That said, he was clear: the absolute frontier still belongs to closed-source models — Anthropic, OpenAI, and Google remain out front.

The compute shortage is worse than you've heard

Baseten currently runs 90 clusters across 18 different clouds, with utilization consistently in the mid-90s — "uncomfortably high," in Tuhin's words. The company holds a standing all-hands meeting dedicated to a single question: how to manage compute capacity against current demand.

Even so, growth remains supply-constrained. Worse, plenty of compute providers in the market are "a bit fraudulent" — they've never operated a data center and don't understand inference SLAs. There are maybe a dozen truly reliable clouds, and only three or four that qualify as "gold tier."

Getting 1,000 B200s right now requires a minimum contract of three to five years, plus 20% to 30% of the total contract value upfront. This means acquiring compute has become a capital-intensive problem — you need not just the demand to absorb capacity, but also a low cost of capital. This directly shapes Baseten's thinking on IPO timing.

Post-training and inference are two sides of the same coin

A few months ago, Baseten acquired Parsed, a post-training research team that was already a Baseten customer — post-training on Baseten, then running inference there. The acquisition logic: post-training and inference are far more tightly coupled than people realize.

A concrete example: quantization strategy depends on how a model was trained, and the training approach affects inference performance. These problems are intertwined. The core loop Tuhin wants to build is: inference generates data, data feeds evaluation, evaluation drives post-training, post-training improves inference — and repeat.

Nvidia's moat isn't the chip — it's the supply chain and ecosystem

On the question of a multi-chip future, Tuhin admits he'd like to see diversification — just as he'd like to see multiple models. Inference-specific chips (like decode-only chips) make perfect logical sense.

But he believes people drastically underestimate Nvidia's advantages in supply chain and developer ecosystem. For an infrastructure company, the most important capability right now is speed — and Nvidia lets you move fastest. Other chip makers face a structural dilemma: if you lock 90% of your capacity to a single large customer, a broader ecosystem around your hardware can never form. That customer even has an incentive to take 95% of capacity, ensuring everything is custom-built for them alone, permanently shutting everyone else out.

Jevons' paradox is playing out in real time in the inference market

Does falling inference cost erode demand? Quite the opposite. Developer behavior is crystal clear: make inference cheaper, and they stuff more intelligence into their products. Agent runtimes get longer, more actions get executed, quality targets go up.

"More intelligence just means better user experience. Better answers, better experiences, more dollars, more revenue."

No customer has ever shown a "good enough" attitude toward answer quality. Tuhin believes inference is "the last market" — even after AGI is achieved, all that remains is inference.

No hero culture — find people who can own entire problems

As Baseten transitioned from an ultra-flat engineering culture to bringing in leadership, the core lesson Tuhin learned was: if you find yourself micromanaging, if you feel you have to be involved in everything, the problem isn't that you're too important — it's that you haven't found the right people.

His hiring criteria are specific: first-principles thinking, collaboration over individual heroism, low ego. If you need a manager to manage you, this probably isn't the right place. Clear standards cut both ways — the right people recognize it immediately, and so do the wrong ones. Baseten has never lost a single one of its top 30 customers, and core talent attrition is equally low.

Ops culture is the price of admission for infrastructure companies

Tuhin's co-founder Amir's seven-year-old, hearing dad's pager go off, asked: "Is that a P0?"

That detail says it all. Inference can't go down, and this culture filters out people who can't handle it very quickly. The Baseten team once sat in a meeting with a group of AWS executives, and within 45 minutes their pagers went off multiple times. Ops culture isn't optional — it repels those who can't adapt, like an immune system.

More articles on TLDRio