better llms & better ai agents

Productivity ⬆️ w/ Grok 3, Claude 3.7 Sonnet, Cursor AI Agent update, OpenAI, and more

Oh boy, so much is happening - on the LLM front as well as on the AI Agent front.

Nowadays, I am working daily with various AI Agents, and my productivity has skyrocketed. Starting with this episode—and in next episodes moving forward—I’ll share in-depth strategies to boost productivity and how to skip skills barriers, eg. cloud development, frontend development, or language barriers.

Join the Premium newsletter to get all episodes and all infos of what I share.

I share my learnings with you, and keep you updated on AI, AI Agents, and how to build best with AI, as new tools/ AI will be launched.

I am launching the updated online course: Everyone can Code! (Updated, because it keeps valuable lessons on working with AI, but has a chapter that I update constantly.)

🚨Special DEAL: Get a yearly Premium Subscription to this newsletter and receive the course Everyone can Code!for free!🚨 (Together worth 200 €)

(You will be emailed with the course entry. Current Premium subscribers will get access to the course automatically.)

On the LLM Front

Last week, xAI released Grok-3, incl. its DeepSearch and Think features. It is the most intelligent AI.

Over the week it even further improved: in the Chatbot Arena LLM Leaderboard it reached now 1403 points, the highest score ever achieved.

I can’t wait to see what their next model will be capable of, as they will be trained on 200k H100s. Reminder: Grok 3 was trained on 100k H100s, the largest computing cluster ever used for training, and 10x more than Grok 2. xAI brute-forces their way not only to the top of AI benchmarks but also to real intelligence.

Meanwhile resource-constrained DeepSeek and others drive innovation in intelligence engineering. Yet, achieving benchmarks like Grok 3 requires both raw power and advanced engineering. Elon Musk accelerated xAI’s progress with strict deadlines, enabling it to rival OpenAI’s established models within a year, though OpenAI’s o3 model remains uncertain.

I work with Grok-3 daily and think it is better than o3-mini-high and even o1 pro mode which I am paying 200$ every month. Grok-3 is in my X premium+ subscription included. 🤔

xAI also announced their AI games studio with the goal of “making games great again“. Grok 3 is already the game engine GOAT.

Yesterday, Anthropic released its Claude 3.7 Sonnet model, including ‘hybrid reasoning‘. This means it has 2 distinct modes—fast, concise responses and slower, in-depth reasoning. After months of silence, Anthropic has finally entered the era of AI reasoning.

Further, this model is strong in coding and software engineering tasks.

A bar chart titled “Software engineering: SWE-bench verified” with five vertical bars representing accuracy scores. The tallest bar, labeled “Claude 3.7 Sonnet,” shows 62.3% accuracy, with a note indicating it can reach 70.3% using a custom self-GPT method. To its right, four shorter bars read: “Claude 3.5 Sonnet (new)” at 49.0%, “OpenAI 0.1” at 48.9%, “OpenAI 0.3-mini (high)” at 49.3%, and “DeepSeek R1” at 49.26%. The y-axis ranges from 0% to 80%. The first bar is orange, and the others are lighter, neutral shades.

I must confess though, that I have cancelled my Claude pro subscription. (More to this later. ⬇️)

It becomes evident what they are doing. Anthropic removes themselves from the super high-pressure Chatbot Arena and focuses on AI that can generate superb code. (And, sophisticated safety frameworks)

What about OpenAI? OpenAI is set to release GPT-4.5, which is allegedly not a reasoning model, this week. GPT-5 has been announced for May.

On the AI Agent Front

Well, hands down, I have never had such a high productivity.

I mentioned Claude 3.7 Sonnet, and that I cancelled my subscription with Anthropic. I did this because I am using their models only for coding and software engineering purposes. As Cursor AI (specifically the agent) is my favorite coding agent and also includes Claude 3.7 Sonnet, I don’t need to double pay.

Yesterday, Cursor AI released 0.46 and it is incredible.

What is better now? It has …

  1. Unified Chat/Composer: Single interface with Ask, Edit, and Agent modes.

  2. Agent Default: Autonomous Agent mode now standard.

  3. UI Refresh: Cleaner design, 3 new themes.

  4. Agent Web Access: Auto-searches web for context.

  5. Security: .cursorignore hides files from LLMs; .cursorindexignore added.

  6. Global Rules: Set via globe icon, plus repo-level rules.

  7. MCP Upgrades: YOLO mode, mcp.json support.

  8. Performance: Crash fixes, better memory use.

But for me, it is the integration of Claude 3.7 Sonnet.

Would you like me to make an intro video and how to reliably build software with it (even as a beginner)?

Login or Subscribe to participate in polls.

Next up, Superhuman is the world’s best email AI agent.

I have been using Superhuman now for almost a year, and it compressed my time I am working on emails from 15 hours a week to 2-4 hours a week, radically. And it gets better and better:

If you think it makes sense for you to try out this product, my referral link gives you a free month: https://superhuman.com/refer/ovhnlybz 

An interesting development in the realm of AI Agents is gibber link.

As we will have a fleet of AI agents talk to each other, to be even more efficient they can now switch into a different way of interacting. Error-proof and 80% more efficient, but also a bit creepy.

Another good addition to your AI agentic arm might be Comet by Perplexity AI.

Comet is an AI web browser. It is designed for "agentic search", and aims to enhance browsing by automating tasks and providing advanced research capabilities.

The promise: faster, more efficient web navigation, deep research tools integrated into the browser, real-time information processing, and personalized search features. I signed up for the waiting list, and will keep you posted.

A dark-themed webpage screenshot for “comet,” a browser for agentic search by Perplexity. Large text in the center reads “comet,” with a smaller tagline “A browser for agentic search by Perplexity.” Beneath that, an email field is filled with “martin@generativeai.net,” followed by a button labeled “Join Waitlist.” The page features a subtle starry or cosmic background, and the top navigation includes the Perplexity logo alongside the phrase “Browse with Intelligence.”

Is there an AI Agent you would like to have me tested?

Reply to this email.

I hope you enjoyed it.

Happy weekend!
Martin 🙇

Our webpage

You might like our last episodes: