OpenAI's Operator is its 1st true AI agent

.. and the world record on inference

In partnership with

Hey folks, this time:
1️⃣ World record on text generation with large AI models
2️⃣ AGI by 2025? AI Reasoning Breakthrough (TTT) & OpenAI’s Next Moves, ..

Especially, in part 2️⃣:

  • What is TTT?

  • What is ARC?

  • What is LoRA?

  • What is OpenAI’s new app use function and Operator?

AI will not replace the work of 50% of people, it will do 50% of the work for 100% of people.

Jensen Huang, Founder CEO NVIDIA

Your Digital Twin, Proxy

  • Your personal digital clone for low value tasks

  • Gets smarter as you give it commands to learn

  • The first truly general AI Agent

(✨ If you don’t want ads like these, Premium is the solution. It is like you are buying me a Starbucks Iced Honey Apple Almondmilk Flat White a month.)

own animation

SambaNova Holds Speed World Record on Llama 3.1 405B

200+ tokens (220 words) per second out of the large Meta model with 405B parameters. This is almost a page a second. Why does it matter? 👇

A bar chart comparing the performance of AI platforms for handling large language models. Platforms listed are Azure, Amazon, DeepInfra, Databricks, Replicate, OctoAI, Groq, Cerebras, and SambaNova Systems. Most platforms have relatively low performance bars (10-30 range), while Groq and Cerebras have unspecified values (marked with a question mark). SambaNova Systems stands out with a significantly higher bar reaching around 200, highlighted in orange, indicating far superior performance compared to others shown in grey. This suggests that SambaNova excels in inference speed or throughput for large models.

Meet the Reconfigurable Dataflow Unit (RDU)—a groundbreaking AI processor.

It has a unique architecture:

  • On-chip SRAM for rapid data access

  • High Bandwidth Memory (HBM) for massive throughput

  • DDR DRAM for extensive storage

This three-tier memory system is a game-changer.

Effortlessly runs Llama 3.1 - 405B.

The RDU excels in speed, efficiency, precision, and performance.

It outperforms traditional GPU systems by:

↳ Reducing redundant memory access

↳ Enhancing data movement

↳ Most importantly, by achieving high inference speeds

SambaNova’s RDU is not just about speed.

It's about redefining AI capabilities.

The faster you can influence, the more functionality and features you can put into your AI service, meaning search, voice, app access, and more.

See my DEMO how to experiment with it:

I’ve covered SambaNova before, plus other AI tech features and personal stories that don’t make it into the newsletter. If you’re interested, follow me on X for more updates! → https://x.com/musiol_martin

AGI by 2025? AI Reasoning Breakthrough (TTT) & OpenAI’s Next Moves, Others are likely to follow

As I mentioned in the last episode, the timeline for reaching AGI (Artificial General Intelligence) is getting shorter. Here’s where things stand:

 Sam Altman (OpenAI): Predicts AGI by 2025

 Dario Amodei (Anthropic): Predicts AGI around 2026/27

FYI: OpenAI’s team is getting back to peak performance.

Meanwhile, something interesting dropped — a new paper titled “The Surprising Effectiveness of Test-Time Training for Abstract Reasoning” (TTT).

(Source)

It’s a big deal because it introduces a novel method to significantly improve AI’s reasoning skills.

What is Test-Time Training (TTT)?

It’s a method that enhances an AI model’s performance on challenging tasks in real time. Here’s why it’s groundbreaking:

  • Performance Boost: TTT achieves up to 6x better accuracy on the ARC dataset (a tough benchmark), reaching 61.9% on public validation sets.

What is ARC? 
The ARC dataset (Abstraction and Reasoning Corpus) is a collection of challenging tasks designed to test an AI’s ability to reason abstractly. Instead of relying on huge amounts of data, it requires the model to understand patterns and solve problems like a human would — without explicit rules or prior examples. It’s considered one of the toughest benchmarks because it tests general intelligence, not just pattern recognition.
  • TTT performs:

    • In-Context Learning: Outperforms traditional end-to-end methods.

    • Data Augmentation: Relies heavily on geometric transformations to fine-tune models.

  • Training Innovations:

    • Per-task LoRA Updates: Custom updates for each task perform better than using shared updates.

    • Self-Consistency: Applies multiple transformations, uses a voting mechanism, and refines predictions.

What is LoRA? 
LoRA (Low-Rank Adaptation) is a technique for fine-tuning large AI models efficiently. Instead of retraining the entire model, it adds small, lightweight layers that adapt to new tasks without changing the core model. This reduces memory and computation needs, making fine-tuning faster and more efficient.

Why TTT is a Game-Changer:

  • Minimal Data Requirements: It uses just geometric transformations for fine-tuning, so there is no need for massive new datasets.

  • Model Agnostic: Works well across different models and model sizes.

  • Integration: It combines well with other methods for top results on complex tasks.

What are geometric transformations? They are simple tweaks like flipping, rotating, or resizing an image. In AI, these changes help the model see variations in the data, making it better at recognizing patterns.

OpenAI’s Roadmap to AGI

OpenAI has shared a five-level roadmap leading to AGI.

Image 2: OpenAI Future Roadmap 1. “Slide outlining OpenAI’s vision for the future of AI, discussing key advancements and upcoming projects like the Operator AI agent.” 2. “OpenAI’s strategic roadmap presentation, focusing on the company’s vision for AI advancements and future goals.” 3. “Overview of OpenAI’s future AI plans, highlighting milestones and upcoming innovations in artificial intelligence.” 4. “Presentation slide detailing OpenAI’s outlook on the next steps for AI development, emphasizing their roadmap towards AGI.”

And speaking of upgrades, the ChatGPT app has fantastic features.

It’s not just a chatbot anymore — it can use your apps, too. Here’s what’s new:

  • Enhanced Search: It can now browse the web. See the episode on it. 

  • Voice Mode: You can speak with it in a realistic voice. → Episode.

  • Code Interpreter (Canva): A built-in environment for coding.

  • 🆕 App and Terminal Access: It can interact with your apps and even your Terminal, giving it control over your computer.

I will make a hands-on demo/ training video for that you can access as a premium subscriber. But here is already the OpenAI Announcement and demo.

This leads to OpenAI’s next big project: Operator, the first real AI agent from OpenAI.

What We Know About “Operator”?

Launching in early 2025, Operator is OpenAI’s true autonomous AI agent, capable of tasks like coding and booking travel. It aligns with Sam Altman’s vision of agentic systems as the next major leap in AI.

But keep in mind: OpenAI isn’t the only player here—Anthropic and Google are developing similar capabilities.

Learn AI in 5 Minutes a Day

AI Tool Report is one of the fastest-growing and most respected newsletters in the world, with over 550,000 readers from companies like OpenAI, Nvidia, Meta, Microsoft, and more.

Our research team spends hundreds of hours a week summarizing the latest news, and finding you the best opportunities to save time and earn more using AI.

( If you don’t want ads like these, Premium is the solution. It is like you are buying me a Starbucks Iced Honey Apple Almondmilk Flat White a month.)

own

Demo: Build a secure, offline AI tool in under 35 minutes.

In this mini-series, I show you how to create an offline AI tool to analyze bank statements, using my frameworks. You’ll learn how to build distributable tools that leverage OS LMs.

No coding expertise is needed—just a clear plan and the ability to ask the right questions to guide your AI through the process. (repost)

Four short videos

1/4:

See the other videos of the demo

That’s a wrap! I hope you enjoyed it.

Martin

Our webpage

You might like our last episodes: