- Generative AI - Short & Sweet
- Posts
- better llms & better ai agents
better llms & better ai agents
Productivity ⬆️ w/ Grok 3, Claude 3.7 Sonnet, Cursor AI Agent update, OpenAI, and more
Oh boy, so much is happening - on the LLM front as well as on the AI Agent front.
Nowadays, I am working daily with various AI Agents, and my productivity has skyrocketed. Starting with this episode—and in next episodes moving forward—I’ll share in-depth strategies to boost productivity and how to skip skills barriers, eg. cloud development, frontend development, or language barriers.
Join the Premium newsletter to get all episodes and all infos of what I share.
I share my learnings with you, and keep you updated on AI, AI Agents, and how to build best with AI, as new tools/ AI will be launched.
I am launching the updated online course: Everyone can Code!✨ (Updated, because it keeps valuable lessons on working with AI, but has a chapter that I update constantly.)
🚨Special DEAL: Get a yearly Premium Subscription to this newsletter and receive the course Everyone can Code!✨for free!🚨 (Together worth 200 €)
(You will be emailed with the course entry. Current Premium subscribers will get access to the course automatically.)
On the LLM Front
Last week, xAI released Grok-3, incl. its DeepSearch and Think features. It is the most intelligent AI.
Over the week it even further improved: in the Chatbot Arena LLM Leaderboard it reached now 1403 points, the highest score ever achieved.
I can’t wait to see what their next model will be capable of, as they will be trained on 200k H100s. Reminder: Grok 3 was trained on 100k H100s, the largest computing cluster ever used for training, and 10x more than Grok 2. xAI brute-forces their way not only to the top of AI benchmarks but also to real intelligence.
Meanwhile resource-constrained DeepSeek and others drive innovation in intelligence engineering. Yet, achieving benchmarks like Grok 3 requires both raw power and advanced engineering. Elon Musk accelerated xAI’s progress with strict deadlines, enabling it to rival OpenAI’s established models within a year, though OpenAI’s o3 model remains uncertain.
I work with Grok-3 daily and think it is better than o3-mini-high and even o1 pro mode which I am paying 200$ every month. Grok-3 is in my X premium+ subscription included. 🤔
xAI also announced their AI games studio with the goal of “making games great again“. Grok 3 is already the game engine GOAT.
Just tested Grok for the first time, and It’s surprisingly solid at coding compared to other AI models.
I used it to generate the logic for this web based 3D augmented reality Pong game, and it handled it well. This took a bit of back and forth but still pretty quick.
— I▲N CURTIS (@XRarchitect)
5:39 AM • Feb 24, 2025
Yesterday, Anthropic released its Claude 3.7 Sonnet model, including ‘hybrid reasoning‘. This means it has 2 distinct modes—fast, concise responses and slower, in-depth reasoning. After months of silence, Anthropic has finally entered the era of AI reasoning.
Further, this model is strong in coding and software engineering tasks.

I must confess though, that I have cancelled my Claude pro subscription. (More to this later. ⬇️)
It becomes evident what they are doing. Anthropic removes themselves from the super high-pressure Chatbot Arena and focuses on AI that can generate superb code. (And, sophisticated safety frameworks)
What about OpenAI? OpenAI is set to release GPT-4.5, which is allegedly not a reasoning model, this week. GPT-5 has been announced for May.
On the AI Agent Front
Well, hands down, I have never had such a high productivity.
I mentioned Claude 3.7 Sonnet, and that I cancelled my subscription with Anthropic. I did this because I am using their models only for coding and software engineering purposes. As Cursor AI (specifically the agent) is my favorite coding agent and also includes Claude 3.7 Sonnet, I don’t need to double pay.
Yesterday, Cursor AI released 0.46 and it is incredible.
we're rolling out @cursor_ai 0.46
your feedback has been loud and clear. we have made agent default, unified chat and composer to a single interface and gave the ui a glow up
other changes include MCP yolo mode, mcp.json, global rules, agent web search and many more fixes and… x.com/i/web/status/1…
— eric zakariasson (@ericzakariasson)
7:10 PM • Feb 24, 2025
What is better now? It has …
Unified Chat/Composer: Single interface with Ask, Edit, and Agent modes.
Agent Default: Autonomous Agent mode now standard.
UI Refresh: Cleaner design, 3 new themes.
Agent Web Access: Auto-searches web for context.
Security: .cursorignore hides files from LLMs; .cursorindexignore added.
Global Rules: Set via globe icon, plus repo-level rules.
MCP Upgrades: YOLO mode, mcp.json support.
Performance: Crash fixes, better memory use.
But for me, it is the integration of Claude 3.7 Sonnet.
Would you like me to make an intro video and how to reliably build software with it (even as a beginner)? |
Next up, Superhuman is the world’s best email AI agent.
I have been using Superhuman now for almost a year, and it compressed my time I am working on emails from 15 hours a week to 2-4 hours a week, radically. And it gets better and better:
If you think it makes sense for you to try out this product, my referral link gives you a free month: https://superhuman.com/refer/ovhnlybz
An interesting development in the realm of AI Agents is gibber link.
As we will have a fleet of AI agents talk to each other, to be even more efficient they can now switch into a different way of interacting. Error-proof and 80% more efficient, but also a bit creepy.
What if an AI agent makes a phone call, then realizes the other person is also an AI agent?
At the ElevenLabs London Hackathon, Boris Starkov and Anton Pidkuiko introduced a custom protocol that AI agents can switch into for error-proof communication that's 80% more efficient… x.com/i/web/status/1…
— Luke Harries (@LukeHarries_)
4:31 PM • Feb 24, 2025
Another good addition to your AI agentic arm might be Comet by Perplexity AI.
Comet is an AI web browser. It is designed for "agentic search", and aims to enhance browsing by automating tasks and providing advanced research capabilities.
The promise: faster, more efficient web navigation, deep research tools integrated into the browser, real-time information processing, and personalized search features. I signed up for the waiting list, and will keep you posted.

Is there an AI Agent you would like to have me tested?
Reply to this email.
I hope you enjoyed it.
Happy weekend!
Martin 🙇
I recommend:
Beehiiv if you write newsletters.
Superhuman if you write a lot of emails.
Cursor if you code a lot.
Bolt.new for full-stack development.
Follow me on X.com.
AI for your org: We build custom AI solutions for half the market price and time (building with AI Agents). Contact us to know more.