Hassabis Says AGI Is Just One or Two Big Ideas Away — and Reinforcement Learning Is Making a Comeback

Source: Y Combinator | Published: 2026-04-29T14:00:34Z

Demis Hassabis believes current architectures will almost certainly be part of AGI, but continual learning, long-horizon reasoning, and memory remain unsolved — with a 50% chance a fundamentally new breakthrough is needed.

Demis Hassabis did two things as a teenager: designed a hit video game (Theme Park) and decided to bet the rest of his life on artificial intelligence. When he co-founded DeepMind in 2010, the prevailing view among investors and academics was "AI was tried in the '90s — it doesn't work." Sixteen years later, his lab produced AlphaGo and AlphaFold, the latter earning him the Nobel Prize in Chemistry. Now he runs Google DeepMind, oversees the Gemini model family, and the goal hasn't changed — artificial general intelligence.

In this conversation at Y Combinator, Hassabis offered remarkably specific takes on the path to AGI, the deficiencies of current models, and how founders should think about "AGI arriving in the middle of your startup journey."

Current architectures are part of AGI, but one or two puzzle pieces are still missing

Hassabis believes large-scale pretraining, RLHF, and chain-of-thought are "almost certainly" components of AGI's final architecture — there's no scenario where these turn out to be dead ends a few years from now. But he identified three unsolved problems: continuous learning, long-horizon reasoning, and memory mechanisms.

His probability estimate: a 50% chance that existing techniques can close the gap through scale and incremental innovation, and a 50% chance that we still need one or two fundamentally new ideas. "No more than two."

The context window is "working memory," and we're brute-forcing everything into it

Hassabis frames current models' memory problem through cognitive neuroscience — his PhD research focused on how the hippocampus consolidates new memories. Human working memory holds roughly seven digits. Today's models have context windows of a million or even ten million tokens. The problem is that we're cramming everything into that window — irrelevant, incorrect, and redundant information alike.

"It's a pretty crude approach. And if you're processing real-time video, a million tokens only gets you about 20 minutes. If you want a system to understand a month or two of your life, it's nowhere near enough."

DeepMind's earliest Atari program, DQN, borrowed "experience replay" from neuroscience — similar to how the brain replays important episodes during REM sleep to consolidate learning. Hassabis believes there's enormous room for innovation in memory. Even when storage isn't the bottleneck, retrieving the one memory actually relevant to the current decision is far from trivial.

AlphaGo-era reinforcement learning is making a comeback, and it's underrated

From Atari to Go to StarCraft, DeepMind has always built agent systems. Hassabis pointed out that the "thinking modes" and chain-of-thought reasoning in today's leading models are essentially a continuation of methodologies pioneered by AlphaGo. He revealed that the team is revisiting some of their earlier ideas — including Monte Carlo tree search — and experimenting with applying them at larger scale and in more general settings.

His prediction: a significant portion of core AI progress over the next few years will come from combining these reinforcement learning and search methods with foundation models.

Distillation hasn't hit information-theoretic limits — small models can still get smarter

Asked about the intelligence ceiling of small models, Hassabis was blunt: nobody currently knows where the theoretical limit of information density lies. His rule of thumb: within six months to a year of a frontier model's release, equivalent capability shows up in tiny models that run on edge devices.

There's strong commercial pressure behind this — Google has over a dozen products with more than a billion users each that need Gemini capabilities, and Search, YouTube, and Maps all demand ultra-low latency and cost at inference time. That pressure drives relentless model compression. Hassabis argues that small models' speed advantage can compensate for a 5–10% capability gap, especially in scenarios requiring fast iteration (like coding), where speed matters more than raw power.

Agents are just getting started — nobody's built the real killer app yet

Hassabis is cautious on agents: the direction is right, but we're still in the experimental phase. He offered a concrete benchmark — no "vibe-coded" game has topped the app store charts yet.

"I can now prototype Theme Park in half an hour — something that took me six months when I was 17. That's stunning. But if you spent an entire summer seriously building, you should be able to make something truly remarkable. So why hasn't anyone shipped an indie game that sells 10 million copies? Something is still missing at the tooling or workflow level."

He thinks one of the missing pieces is continuous learning — current agents can't adapt to the specific environment they're operating in, which leaves them underpowered when facing complete, end-to-end tasks. For true "hand it off and forget about it" capability, the system must keep learning as it executes.

Models still exhibit "jagged intelligence" in reasoning

Hassabis likes testing Gemini's reasoning with chess, because correctness is objectively verifiable. He's observed a telltale failure mode: the model identifies a move as a blunder in its chain of thought, fails to find anything better, and then plays the blunder anyway.

"In a precise reasoning system, you shouldn't see that."

This is why models can solve IMO gold-medal problems yet stumble on elementary school math. Hassabis suspects the issue lies in the system's lack of introspection over its own reasoning process — fixing it might require only one or two key improvements, but those improvements haven't arrived yet.

Multimodality is Gemini's underappreciated edge

Hassabis argues that training Gemini as multimodal from the start made early development harder but is now paying off. Key application areas include robotics (Gemini Robotics is built on the multimodal foundation model), autonomous driving (Waymo), and personal assistants that understand the physical world.

His strategic conviction: future AI devices need to understand the physical environment around you and intuitive physics — something purely text-based models cannot do, and where Gemini holds a distinct advantage.

A virtual cell is roughly a decade away — the bottleneck is data, not algorithms

Hassabis extends the AlphaFold vision to something far grander: building a complete virtual simulation of a cell. DeepMind's science team is starting with the cell nucleus — because it's relatively self-contained, with inputs and outputs that can be reasonably approximated.

But the critical bottleneck isn't the model; it's data. No existing imaging technology can provide nanometer-resolution dynamic observation without killing the cell. If that hardware problem is solved, cell simulation becomes a (massively complex) vision problem. An alternative path is building better learned simulators for dynamical systems. Hassabis estimates a full virtual cell is roughly ten years out.

The recipe for "AlphaFold-style breakthroughs": vast search space + clear objective function + sufficient data

Asked which scientific domains are ripe for the next AlphaFold, Hassabis laid out three conditions:

1. A vast combinatorial search space — so large that no brute-force algorithm can exhaust it. The number of possible Go moves and protein conformations both dwarf the number of atoms in the universe.
2. A clear objective function — protein folding minimizes free energy; Go has winning and losing.
3. Sufficient data, or a simulator capable of generating high-quality synthetic data.

Drug discovery fits this framework — there must exist some compound that cures a given disease with no side effects. The problem is simply finding it efficiently.

True scientific discovery requires going beyond pattern matching, but it isn't magic

Hassabis acknowledged that current systems can't achieve genuine scientific discovery. He used a precise analogy: AlphaGo can play a brilliant move like "Move 37," but can it invent the game of Go?

Imagine giving a system a high-level description — "a game whose rules can be learned in five minutes, whose depth can never be exhausted, that's aesthetically satisfying, and that can be played in an afternoon" — and expecting it to return Go. Today's systems can't do that.

He proposed an "Einstein test": train a system on physics knowledge available in 1901 and see if it can independently derive Einstein's 1905 annus mirabilis results, including special relativity. Passing that test would mean the system can produce genuinely novel insights. He believes this requires some form of analogical reasoning that current models don't possess — or that we haven't yet figured out how to elicit.

Founders must factor AGI's arrival into their business plans

Hassabis identified what he considers the most defensible startup direction: AI crossed with another deep-tech domain, especially ones involving the world of atoms — materials, pharmaceuticals, hard science. These won't be disrupted by the next foundation model update.

But he added a temporal dimension that few people think seriously about: if your AGI timeline is around 2030, as his is, and deep tech typically requires a decade-long cycle, then AGI will arrive in the middle of your startup journey. You need to think ahead — can your product leverage AGI? How will AGI systems use what you're building?

He predicts the future architecture will feature general-purpose models (Gemini, Claude) serving as orchestration hubs that call specialized systems like AlphaFold as tools — rather than packing every capability into one giant brain. Stuffing protein data into Gemini is not only unnecessary but would degrade its language abilities. For founders building vertical AI systems, this insight points to a clear path for survival.