AI Self-Improvement

AlphaEvolve, Absolute Zero, and The Darwin Gödel Machine

Martin Musiol
May 31, 2025 • Estimated Reading Time: 11 minutes

Hi AI enthusiasts,

Alright, let's dive into something that's simultaneously mind-bending and incredibly exciting, but also scary: self-improving AI.

It makes me sit up and pay attention. Let’s get it.

Google DeepMind’s Milestone Contribution

Demis Hassabis, the Co-founder and CEO of DeepMind, has been talking about the combinatorial explosion of AI – think language models getting cozy with evolutionary programming.

Stage portrait Close-crop, three-quarter view of a middle-aged, clean-shaven man with a closely shaved head and salt-and-pepper goatee. He wears navy rectangular glasses, a charcoal suit, white shirt, and dark tie. A thin headset microphone arcs from his right ear. Warm stage lighting highlights his face; background is pitch-black.

He points out that we've seen this in "well-described domains." Chess, Go, other games – places with clear rules where an AI can play against itself, or a clone, practically forever.

But the real world? It’s much more messy. How do you get an AI to self-improve on a gnarly math problem? Or untangle a complex legal case?

AlphaEvolve, the brainchild of Hassabis's team at Google DeepMind, marks a novel moment where Gemini, through AlphaEvolve, actually optimized its own training process. We're talking about an AI agent that evolves algorithms for math and practical computing by mashing up the wild creativity of large language models (LLMs) with automated evaluators.

So, instead of trying to stamp out LLM "hallucinations" – which the whole industry seems obsessed with – DeepMind is harvesting that creativity.

Turns out, hallucinations are more important than we thought.

And the automated evaluators? Crucial! No more manually checking; the system tests for actual improvements automatically.

The AlphaEvolve paper is mesmerising. They describe how it's already boosted efficiency in Google's data centers, tweaked chip design, and refined AI training – including the training for AlphaEvolve itself.

See that? A recurring, semi-automated AI improvement loop is already happening.

He’s already IPO’d once – this time’s different

Spencer Rascoff grew Zillow from seed to IPO. But everyday investors couldn’t join until then, missing early gains. So he did things differently with Pacaso. They’ve made $110M+ in gross profits disrupting a $1.3T market. And after reserving the Nasdaq ticker PCSO, you can join for $2.80/share until 5/29.

Invest today for $2.80/share.

_{This is a paid advertisement for Pacaso’s Regulation A offering. Please read the offering circular at}_{invest.pacaso.com}_{. Reserving a ticker symbol is not a guarantee that the company will go public. Listing on the NASDAQ is subject to approvals. Under Regulation A+, a company has the ability to change its share price by up to 20%, without requalifying the offering with the SEC.}

(✨ If you don’t want ads like these, Premium is the solution. It is like you are buying me a Starbucks Iced Honey Apple Almondmilk Flat White a month.)

And the mic drop moment? AlphaEvolve took on the Strassen algorithm for 4x4 complex-valued matrix multiplication – a problem that’s stumped folks for over 50 years – and cut the scalar multiplications from 49 to 48.

That ~2% reduction from AlphaEvolve saves roughly 7 × 10^24 calculation steps. That translates to conserving about 1.4 × 10^13 joules of energy – enough to power around 400 U.S. households for a year. Daily, that's like giving ~130,000 U.S. households their daily energy fix for free. Huge implications!

AlphaEvolve is a DeepMind masterpiece.

Plus, AlphaEvolve slashed the time to upgrade its own core software from months of human work to just days of AI automation. This speeds up progress, frees engineers for bigger challenges, and points to more self-sufficient, user-friendly AI.

And it's just V1. 👀

Absolute Zero 🚩

Now, keep Absolute Zero in mind—another paper on self-improvement. It flips supervised learning (defined goal, AI trains) and reinforcement learning (defined goal, AI verifies rewards) on their heads.

Learning-paradigm cartoon Three side-by-side panels on a left-to-right arrow labeled “Less Human Supervision”. 1. Supervised Learning: Human with game-pad directly controls a small rust-colored robot sprinting toward a red flag. 2. Reinforcement Learning with Verifiable Rewards: Human points, robot (now orange and sleeker) runs autonomously, thought-bubble shows the red-flag goal. 3. Absolute Zero (Ours): No human; a large orange robot runs while a smaller twin observes and thinks about the flag—robots teach themselves.

Absolute Zero proposes AI devising its own goals via LLMs, then using verifiable rewards for automated course-correction and training.

This echoes Leopold Aschenbrenner’s "Intelligence Explosion" essay, predicting an inflection point around 2027 where AI becomes recursively, automatically more intelligent.

“Scenario: Intelligence Explosion” chart Log-log plot (y-axis Effective Compute normalised to GPT-4, 10⁻⁸ → 10¹⁵; x-axis years 2018–2030). A blue exponential curve tracks compute growth; labels along it equate compute to developmental stages: GPT-2 = Preschooler, GPT-3 = Elementary schooler, GPT-4 = Smart high-schooler. Dashed horizontal line (~10⁶) tagged “Automated Alec Radford?”. Above 10¹¹ a second dashed line speculates “Super-intelligence?”. Around 2026 an arrow labeled “Automated AI Research” points into a widening purple band that rapidly ascends, illustrating uncertain but explosive future growth.

Sakana AI’s Darwin Gödel Machine - Further Accelerating AI Self-Improvement?

Then there's Sakana AI's Darwin Gödel Machine (DGM): AI that improves by rewriting its own code.

Inspired by evolution (going full circle with Demis Hassabis’ initial comment), DGM maintains an expanding lineage of agent variants, enabling open-ended exploration. Sakana AI believes continuous self-improvement is vital for stronger AI.

Multi-series line graph tracking SWE-bench score (y-axis 0.15→0.50) against iterations (x-axis 0→80). • Cyan line=“Average of Archive”, a shallow ramp from ≈0.20 at iteration 0 to ≈0.27 by iteration 80. • Blue line=“Best Agent”, volatile early jumps: 0.20 → 0.17 (dip) → 0.25 by 10, vertical leap to ≈0.40 at ≈22, small climbs to 0.43 (≈40) and 0.45 (≈55), peak 0.50 at 65, then flat. • Black, bold markers trace the ancestry of the final agent. Gray call-out bubbles mark key interventions: “Non-empty patch validation and retry” (~5), “More granular file editing via string replacement” (~22), “More granular file viewing via lines” (~28), “Auto-summarize on context limit” (~42), “Multiple patch generations and ranking” (~58), “History-aware patch generations and ranking” (~70). Title centered on top.

On SWE-bench (a test for solving software engineering tasks like bug fixing), DGM jacked its performance up from 20.0% to a whopping 50.0% – automatically. On Polyglot (a multilingual benchmark), it boosted its success from 14.2% to 30.7%, leaving standard hand-designed agents in the dust.

The results can’t be ignored.

“DGM Archive Tree” diagram Rooted tree diagram (node 0 on top) fanning into six generations (~80 numbered nodes). Node fills use a yellow-to-green gradient (0 → 0.5 score), while outer rings encode evaluation scale: red=10 tasks, orange=60, green=200. A thick black trunk traces the lineage culminating in a green-star leaf (final best agent, node 56). Color bar legend on right; title centered above.

Currently, one run to generate an improved agent costs around $22,000. But as I've written before, expect these costs to drop rapidly thanks to deflationary cycles in tech. (→ https://mail.generativeai.net/p/the-ai-dividend )

Looking ahead, AI agents will likely run their own A/B tests, discover new skills like smarter file editing or stricter patch checks, and port across models with less human intervention, ultimately boosting performance. This will scale intelligence, but it's not without its critics.

For instance, some voices like this one raise concerns:

This is insanely, recklessly dangerous.
If you wanted to guarantee human extinction, this is exactly how you'd do it.
(I say this as lead author of one of the first papers (Miller, Todd, & Hegde, 1989) to use genetic algorithms to evolve neural networks. I quickly realized the
— Geoffrey Miller (@primalpoly)
3:52 PM • May 30, 2025

And we have figures like Dario Amodei, CEO at Anthropic, openly discussing the potential dangers of advanced AI.

CNN Business screenshot Webpage header “CNN Business” above bold headline: “Why this leading AI CEO is warning the tech could cause mass unemployment”. Byline Clare Duffy; timestamp May 29 2025 6:04 PM EDT. Hero still image: curly-haired man in glasses and white shirt mid-speech against cream built-in cabinets, eyes half-closed. Lower-third red banner “NEW DEVELOPMENTS” and subtitle “TECH CEO SAYS WE COULD SEE HALF OF ENTRY LEVEL JOBS DISAPPEAR AND 10% TO 20% UNEMPLOYMENT BECAUSE OF A.I.” Small “AC360°” tag bottom-right; share icons under headline.

From my perspective, the transition into the AI era is already a bit bumpy, and will continue to be.

However, the ultimate upside potentials are infinitely better, and I believe there's a strong chance (80% or more) we're heading towards an open, beneficial version of this future, rather than a dystopian one.

What do you think? Let me know!

I hope you enjoyed it.

Happy weekend!
Martin 🙇

I recommend:
- Beehiiv if you write newsletters.
- Superhuman if you write a lot of emails.
- Cursor if you code a lot.
- Bolt.new for full-stack development.
- My book - Generative AI: Navigating the Course to AGI.
- Follow me on X.com.
AI for your org: We build custom AI solutions for half the market price and time (building with AI Agents). Contact us to know more.

You might like our last episodes:

The AI Dividend

Why AI will save world

mail.generativeai.net/p/the-ai-dividend

An AI-expert view on how we will work and live

from Nvidia, Google, OpenAI, and more

mail.generativeai.net/p/an-ai-expert-view-on-how-we-will-work-and-live