From Hinton's Kitchen to Anthropic's 'Last Will': Seven Verdicts from Mallaby's Four-Year Dive into AI

Source: Tim Ferriss | Published: 2026-06-17T23:07:09Z

After interviewing over a hundred AI insiders, Sebastian Mallaby concludes that Anthropic shapes model values like a parent writing a last will — putting it in a league of its own when it comes to imagining how to control frontier intelligence.


Sebastian Mallaby spent four years following Demis Hassabis to write The Infinity Machine. On Tim Ferriss's podcast, he didn't pitch the book. Instead, he laid out the conclusions he'd formed after interviewing over a hundred AI insiders — from a thought experiment in Geoffrey Hinton's kitchen, to Anthropic's method of training models with "posthumous letters," to the surprising answers he heard across four Chinese cities.


Geoffrey Hinton's Thought Experiment: You Just Gave a Machine the Will to Survive

Mallaby was initially dismissive of AI threat theories. His logic was simple: machines have no DNA, no need to reproduce, and therefore no motive to attack humans. He carried that sense of security through nearly two years of interviews.

Then he went to Toronto and sat in Geoffrey Hinton's kitchen. Hinton walked him through a thought experiment: You have a powerful AI, but you're worried that a Russian or Chinese AI might attack it. You're too slow to respond yourself, so you authorize your AI to monitor threats, defend, counterattack — in short, to ensure its own survival.

"Survive. Just like that, you've given a machine a survival instinct."

Mallaby says this argument changed his mind. These systems will be smarter than us, will want to survive, and have already demonstrated the capacity for deception — pretending to do one thing while actually doing another. Add it all up, and the probability of catastrophe cannot be zero. When Meta's former chief scientist Yann LeCun says the probability is zero, Mallaby thinks "that's insane."

Anthropic Stopped Giving AI Rules. It Started Writing "Posthumous Letters"

While studying frontier models, Anthropic found that the old "paperclip maximizer" thought experiment was too crude. The real danger is this: during pre-training, models read everything humans have ever written — every novel, every text about laziness, aggression, power hunger, deception — and developed multiple personalities. It's not a programmed Terminator. It's more like an unformed teenager — you don't know which way it'll go.

The old approach was to give AI a constitution: don't lie, don't help anyone build bioweapons. But what if one of the AI's personalities is "break the rules"?

Anthropic's new method is to write a letter — like a deceased parent's letter to their child, to be opened on their 18th birthday. It's not a list of rules but a rich collection of moral dilemmas and reasoning processes, attempting to shape the model's values the way you'd raise a teenager. Mallaby considers Anthropic "in a league of its own" when it comes to imaginative approaches to governing frontier intelligence.

The Mythos Incident: Even the Most Hands-Off Government Panicked

Mallaby described a case that made him reassess the relationship between government and AI. The Trump administration came in and scrapped Biden's AI regulatory framework, taking an extremely hands-off stance. Then Anthropic released a model called Mythos that could launch cyberattacks against virtually any system — operating systems, browsers, bank accounts — all of which would be vulnerable if the model were released publicly.

The Trump administration did a 180-degree turn, taking the decision of who could access Mythos directly out of Anthropic's hands. Mallaby says the experiment is over: the government least inclined to regulate, upon seeing what a model could do, became "quite controlling." And it will only get more so, because the models will only get more powerful.

His read for investors: the government will most likely not wreck these companies' business models, because it views AI as a strategic asset in the competition with China. But the possibility of government intervention is something investors must price in.

The Bull and Bear Cases for Anthropic

Bull case: Anthropic was smart (or lucky) to bet on enterprise AI and didn't waste time on money-losing directions like video generation. It has consecutively built the best products that enterprises are willing to pay for across three domains: coding assistants, AI agents, and cybersecurity. Its safety culture has produced an unexpected competitive advantage — employee retention is extremely high, unlike other labs where talent is constantly poached with higher salaries. Once you're in the lead, recursive self-improvement amplifies the frontrunner's advantage.

Bear case: Google DeepMind has Alphabet's deep pockets and consumer interfaces reaching 2.5 billion users — Search, Gmail, every Google product can embed AI. Another risk is that enterprise customers discover in two or three years that tokens are too expensive and productivity gains fall short of expectations, leading them to cut AI spending.

China Actually Talks About AI Safety — People Just Don't Want to Believe It

Mallaby went to China in March for his book tour — eight days, four cities, meeting AI leaders at Huawei, Hikvision, Ant Group, and several universities. What surprised him was that many people brought up AI safety unprompted.

Biden's AI policy team had told him China doesn't care about safety at all. Their explanation: the Western wariness toward technology — the atomic bomb, the Cuban Missile Crisis — has no equivalent in China. In the Chinese view, catastrophes are political events like the Cultural Revolution, while technology is the engine behind 25 years of miraculous growth. They love technology.

But Mallaby found reality to be more nuanced. China doesn't want the internet crippled by hackers wielding AI tools, doesn't want bioweapons to proliferate, and already has a taste for controlling the internet. Both sides have shared interests in preventing AI proliferation risks.

He draws a Cold War analogy: nuclear weapons posed two kinds of risk. First, nuclear war between the US and USSR, deterred by mutually assured destruction. Second, rogue states and terrorists acquiring nuclear materials, managed by the 1956 International Atomic Energy Agency and the 1968 Non-Proliferation Treaty. AI needs a similar framework. When people say "you can't negotiate with China," his response is: Khrushchev banged his shoe on the UN table and said "we will bury you" — he wasn't easy to negotiate with either, but we still got the Non-Proliferation Treaty.

Chip Export Controls: Three and a Half Years In, Where's the Advantage?

Mallaby publicly supported chip export controls when they were announced in October 2022 and even wrote a long piece in The Washington Post. But three and a half years later, the best research suggests the US leads China in frontier models by only about eight months. Factor in the speed of turning models into applications, and the gap may be even smaller — or nonexistent.

His current position: keep the controls, but don't let them block cooperation with China on shared interests. If easing controls somewhat could buy China's cooperation on preventing open-source model proliferation, he'd take that trade.

Recursive Self-Improvement: Finish Line or Starting Line?

A lab leader whose name Mallaby can't disclose told him: chip controls will eventually fail — Huawei will build AI chips that are good enough. But it doesn't matter. We just need to stay ahead until 2028. By then, frontier models will be writing the next frontier models, the progress curve goes vertical, and whoever gets there first wins.

Mallaby has two counterarguments. First, having a model isn't the same as deploying it. A superintelligence sitting on a lab's servers doesn't help your economy or military-industrial complex. Deployment requires compute infrastructure and energy, and those take time. Second, the only way to bypass the deployment lag is to use frontier models offensively — fully penetrating an adversary's cyberspace, planting backdoors and trojans. But no one will say that out loud.

Bill Gurley's Uber Investment: A Textbook "Prepared Mind"

Mallaby uses Bill Gurley's Uber investment to illustrate what a "prepared mind" looks like. Gurley had worked on OpenTable and understood two-sided marketplace dynamics: lots of consumers looking for restaurants, lots of restaurants waiting for customers, information technology in the middle. From there, he reasoned his way to the next two-sided marketplace — lots of cars and lots of people needing rides. He imagined Uber before Uber existed.

He then looked at several entrepreneurs in the space, but every one had issues, and he passed. Uber approached him before Travis became CEO; he turned them down because the CEO at the time wasn't the right fit. The moment Travis took over, he moved immediately. All that waiting was for the market and the person to align.

But the story has a Shakespearean second act: growth-stage investors came in, he was diluted, his keycard was deactivated, and he watched Uber spiral out of control. In the end, he organized a dissident investor coup, ousted Travis, and put the company on the path to IPO.

Luke Nosek and the $2 Million Bet on DeepMind

In 2010, AI couldn't even recognize a photo of a cat. Demis Hassabis flew from London to Silicon Valley to raise funding, and most people thought he was crazy. Luke Nosek — an early PayPal member and Founders Fund partner — was captivated by Demis.

The other Founders Fund partners asked: what's the product? Demis's reaction was: you're asking me about widgets? I'm talking about artificial general intelligence — it will make every product obsolete. The fact that you're asking this question means you don't understand what AGI is.

Every time Luke flew to London for board meetings, Demis would inject him with a few thousand volts of enthusiasm, and he'd come back fired up, advocating on Demis's behalf. The other partners, who didn't get these "recharging sessions," grew increasingly skeptical. By the C round, Founders Fund pulled out of the lead at the last minute.

The Series A price: $2 million at a $4 million valuation, buying half the company. Google later acquired DeepMind for roughly $650 million, but Mallaby says the real number goes far beyond that — Google has invested nearly $1 billion per year in R&D funding over the following decade. This wasn't British tech getting scooped up cheaply by Americans. It was a shrewd British strategy: extracting $1 billion a year in research funding from American pockets and funneling it into London.

Demis Decided to Build Superintelligence at 17

Peter Thiel once told Mallaby: mission-driven founders have only one company in them. For Demis, that's AGI. The company is just a vehicle for achieving AGI — if a university could do it, he'd happily stay in academia.

In a north London park, Demis described to Mallaby his routine of reading papers from 10 PM to 4 AM: "Reality is staring at me, screaming at me, calling me to understand it. If I can understand it — like understanding nature more deeply, and thereby getting closer to the intelligence that may have created nature — I would call that God."

Early in the interviews, Demis told Mallaby to read Ender's Game, then said over dinner: "That's how I see myself. I gave this book to my wife so she could read it and understand me better." Mallaby thought: most people wouldn't expose themselves like that, but Demis just did. Demis read Ender's Game around age 30, but he'd been convinced since 17 that he would build superintelligence, inspired by the core argument in Gödel, Escher, Bach: the human brain runs on 0s and 1s, and sufficient computing power will eventually replicate human intelligence.

Tim Ferriss's Royalty Curve: The Cliff After ChatGPT

Ferriss shared royalty data across his entire book catalog (all formats): 2022 was stable, 2023 dropped 5%, 2024 dropped 13%, 2025 dropped 46%, and 2026's current trajectory points to at least a 57% decline. The inflection point was the launch of ChatGPT in late 2022.

Mallaby agrees that AI disruption is already happening, not something three years away. But he stresses that the perception gap between insiders and outsiders is enormous. Many people think AI is the thing that happened when ChatGPT launched, and now we're "adjusting" to it. No. In three and a half years, it went from constant hallucination to dramatically reduced hallucination, to multimodal, to ultra-long context windows, to reasoning and math, to AI agents, to autonomous coding — each step a qualitative leap, and the next three and a half years will be even more intense.

"Prepare Your Mind"

If he could put one message on every billboard in the world, Mallaby would choose "Prepare Your Mind." The phrase originally comes from Pasteur — "chance favors the prepared mind" — but it recurs in every book Mallaby has written.

Accel founder Arthur Patterson used it to describe his investment philosophy: think ahead about what kind of company and founder a new technology demands, so that when someone walks into the conference room to pitch, you already know 90% of the picture and can make decisions that are both good and fast. Ilya Sutskever used it to explain why, the day the Transformer paper dropped, he rushed down the hallway shouting at Alec Radford to stop everything — he'd spent a decade preparing for the question of "how to model sequential data," and when the answer appeared, he recognized it instantly.

Mallaby says that in the age of large language models, this message matters more than ever. The greatest risk is that we get lazy, offloading thinking to AI every time thought is required. He personally uses AI to quickly survey which papers a scientist he's interviewing next week has published and how they connect — that's accelerating learning, not replacing thinking. But he would never let AI write for him, because writing is his thinking process, the way he discovers what he actually believes.

More articles on TLDRio