I used to like the data-wall argument.
Not because I wanted it to be true. Because it made the future feel less slippery.
The argument was clean: the internet only has so much good human text. Epoch AI estimated roughly 300 trillion usable tokens, with the wall arriving somewhere between 2026 and 2032. At some point, models would run out of road.
I could understand that. I could plan around it.
But I think I was holding on to the wrong wall.
The public internet is finite (even though expanding fast). Learning material is not.
What changed my mind was synthetic data, though I hate the phrase. It sounds like fake knowledge. Sometimes it is.
But the good version is much less mystical: make practice problems, grade the answers, keep the examples that teach, throw away the rest. Scale becomes less about scraping and more about curriculum.
That is why Nick Thompson's clip with Sam Altman landed for me:
the surprising part is not that human data stops mattering. It is that models may be able to manufacture more of the right kind of practice, if the feedback loop is good enough.
This issue is about judgment. That is exactly why Cube's Agentic Analytics Summit fits.
AI can answer data questions fast. The danger is messy definitions: three teams, three versions of churn, five dashboards that disagree. Cube's summit is about the layer underneath agentic analytics: governed metrics, semantic context, and answers you can trace.
Sessions include Joe Reis, plus practitioners from Brex and Jobber.
Human data still matters. It is the anchor. Without it, models drift into nonsense. But it is not the ceiling.
That realization bothered me more than I expected.
I like constraints. They make the world feel legible. A hard data limit was comforting because it gave me a place to stop thinking.
The real bottleneck has moved from "do we have enough material?" to "can we tell what is actually good?"
I see the same pattern with agents.
I used to assume serious agent work would need clean integrations: APIs, MCP servers, no clicking through software like a person.
Then I started using GPT-5.5 to operate real software through the GUI.

My rough mental model: AI capability does not stop at the old data wall. It bends into synthetic practice, harder evaluation, and more responsibility for the human supervising it. At some point we’ll have 1000 IQ point models, perhaps a million. (What will it be capable of then?)
It clicks, types, waits, checks the screen. Sometimes it gets confused and I stop it (today: about 1 in 50 times, and improving). Sometimes it is too slow. Sometimes I do not trust it.
But it works often enough that my habits are changing. That is the uncomfortable part.
Most companies do not have five tools. They have forty. Some have bad APIs, no API access, or ancient dashboards nobody wants to touch.
GUI agents make those tools reachable now. Not cleanly. Not perfectly. But reachable.
Learning: Use APIs and MCP servers for frequent, risky, or structured work. Do not click through a CRM 500 times if a real integration exists.
But for the messy edge of work, let the agent use the interface. Watch closely, build trust slowly, keep logs, set limits, and make it ask before irreversible actions.
The data wall did not vanish. It became a judgment problem.
The integration wall did not vanish either. It became a supervision problem.
But that also is the real work now.
The new wall is not whether we can generate more. It is whether we can tell what is worth keeping.
If this helped, forward it to someone still waiting for the data wall.
— Martin
You're receiving this because you subscribed to Generative AI: Short & Sweet. Unsubscribe

