In partnership with

If you haven’t noticed a quiet shift is happening:

You no longer need to be a deep technical expert to build useful software.

You need a clear goal, simple structured questions (and sentences), and the willingness to understand the answer just enough to keep going.

That is new.

The newest frontier models are no longer just autocomplete tools. GPT-5.5 with xhigh reasoning in tools like Codex or Cursor (click here) is now a serious development agent. Claude Code with Opus 4.7 is also very strong. But right now, if I had to choose one model for hard coding work, I would start with GPT-5.5.

There are already rumors about GPT-5.6, but as of May 27, 2026 it is not officially announced. The direction is clear: these agents are getting very good, very fast.

And this is not just my personal impression. Developers are talking about it everywhere. More importantly, I see non-technical clients using it too.

They are building.

Not because they suddenly learned software engineering in depth, but because they learned to trust the AI:

  • What should this app do?

  • What data does it need?

  • What should happen when something goes wrong?

  • Can you inspect the codebase first?

  • Can you make a plan, implement it, test it, and tell me what changed?

That is already enough to begin.

A new benchmark called DeepSWE makes this visible. It measures frontier coding agents on original, long-horizon engineering work, not only small GitHub issue fixes.

On its leaderboard, GPT-5.5 with xhigh reaches 70%, ahead of GPT-5.4 and Claude Opus 4.7.

But the ranking is not the most important part.

The important part is the shape of the work.

DeepSWE tasks use shorter, more natural prompts than many older benchmarks, but require much larger code changes: hundreds of lines, multiple files, real exploration.

That is closer to how people actually prompt agents:

“Build this.”
“Fix this workflow.”
“Connect these pieces.”
“Make it robust.”
“Test it.”

Last week Viktor wrote a brief, built a landing page, and opened a pull request.

Last week, Viktor wrote a campaign brief, built a landing page, opened a pull request, generated a board-ready PDF from live Stripe data, and sent a follow-up email to a churned customer. All from Slack. Same colleague that also pulls your reports and monitors your dashboards. 5,700+ teams. 3,000+ integrations.

This is why the skill is shifting from “can you write code?” to “can you describe the system you want clearly enough?”

The same applies to automation.

In my own work, I am increasingly automating one operational workflow per day: invoices matched to the right transactions, tax packages assembled from scattered documents, finance evidence prepared before it goes to the tax advisor.

Soon, a lot of this will not need manual back-and-forth at all.

TRY THIS example automation you can start with today (you’ll be amazed):

Every morning, scan my invoices folder and bank transactions. Match invoices to transactions by amount, date, and merchant. Create three lists: matched, missing invoice, and needs review. Copy upload-ready evidence into one folder. Do not upload or send anything. Summarize counts and top issues.

That is the practical unlock:

If a workflow can be described, checked, and repeated, it can probably be automated.

So the next useful skill is not “learn every programming language.” It is learning how to think in structured requests (The goalpost of what you have to learn moves. You also don't need to understand in depth anymore how your GPU works. If it doesn't, you get a new one.):

Goal: What should exist when this is done?
Inputs: What files, data, tools, or accounts are involved?
Rules: What must never happen?
Output: What should the agent produce?
Verification: How do we know it worked?

That is enough to start building apps, internal tools, dashboards, automations, agents, and business workflows.

Yours

— Martin

You're receiving this because you subscribed to Generative AI: Short & Sweet. Unsubscribe

Keep Reading