- Generative AI - Short & Sweet
- Posts
- o3 and last tech updates of 2024
o3 and last tech updates of 2024
2025, big things will be built... also by you 🫵
Quick holiday check-in!
While you hopefully are enjoying your well-deserved time with your loved ones, I wanted to share the last AI highlights of 2024.
Before we do so, I am grateful you follow my AI/ tech journey. THANK YOU!
Next year, big things are planned.
We will build edge AI applications for you to follow (for instance, with the Jetson Orin Nano Super)
For Christmas, I got a PlayStation VR2 headset (🤩), and we will build VR applications.
And, of course, we will double down on AI Agents. AI Agents in many vertical use cases, i.e. customer service.
→ (I am also playing with the idea of building an AI Agent in the crypto space, but it is pretty complex: LMK if you have great use cases or want to team up)
Find below:
- o3 → the new-gen model by OpenAI (is it AGI?),
- DOWNLOAD your customizable AI Agent script ⬇️⬇️
- Unitree’s 4-legged robot is exceptionally agile, fast, and strong.
Writer RAG tool: build production-ready RAG apps in minutes
RAG in just a few lines of code? We’ve launched a predefined RAG tool on our developer platform, making it easy to bring your data into a Knowledge Graph and interact with it with AI. With a single API call, writer LLMs will intelligently call the RAG tool to chat with your data.
Integrated into Writer’s full-stack platform, it eliminates the need for complex vendor RAG setups, making it quick to build scalable, highly accurate AI workflows just by passing a graph ID of your data as a parameter to your RAG tool.
(✨ If you don’t want ads like these, Premium is the solution. It is like you are buying me a Starbucks Iced Honey Apple Almondmilk Flat White a month.)
o3 - the AGI-ish model - has superhuman results across benchmarks
o3 is the succeeding model of o1 (they skipped o2, because of potential confusions with o2 the company).
It uses what OpenAI calls a "private chain of thought," where it pauses to internally deliberate before generating responses, aiming to simulate human reasoning processes more effectively.
How good is it?
o3 excels in coding, math, and intelligence.
It scores 71.7% on SWE-Bench (real GitHub issues) and 2727 Elo on Codeforces, ranking with top, top, top coders.
In math and science, it achieves 96.7% on AIME, 87.7% on GPQA, and 25.2% on EpochAI, far surpassing others.
On ARC-AGI (THE AGI BENCHMARK - tests AI’s ability to learn and generalize like humans), it scores 87.5%, outperforming humans in learning from minimal examples.
And even experts from other fields are stunned.
See what Derya Unutmaz (MD), expert immunologist, has to say:
Today, I’m sharing another insanely good o1-Pro scientific insight! This one is particularly special to me to a point of making me emotional, its that profound🥹
I asked o1-Pro to critically evaluate a review my students & I had written about a specific subset of immune cells called MAIT cells and their role in cancer. The result? I’m simply shocked beyond belief at o1-Pro’s critiques! 😱They were more insightful than my own—and this is a topic where I’m one of the few top experts in the world, having made some of the key discoveries!
As I read through its feedback, I found myself staring at my computer screen, fixated, overwhelmed by a mixture of emotions: disbelief, awe, joy, and a profound sense of humility.☺️Every single point it made, every question it asked—everything was unbelievably insightful!
The depth of its analysis is truly hard to comprehend. Even though we believed we had written a great review on the topic, which was accepted with only minor critiques, I was deeply humbled, thinking, “I should have addressed and included all these insights in the review.” Ouch! The only solace is that it didn’t find any errors.
[… more detail in post …]
o3 is currently too expensive to use broadly (thousands per problem), but we all know the AI (and compute) costs are dropping faster than my productivity when I “just check” social media.
Scale of systems like o3 comes faster than anyone can imagine.
Build Your Custom AI Agent: Download The AI Agent Script
Last week, I gave a workshop in Seoul at the AI Summit on how to build AI agents, along with delivering a keynote on the same topic.
What people loved most was understanding how to build an AI agent with just one script.
To get started, you need to first understand the key elements of an AI agent.
Download the script here, add your OpenAI API key (or any AI model you prefer), and enter the respective API key of the tool you’re using. By default, it’s set to the Weather API.
Now, get started and start experimenting! 🔥
Unitree B2-W (Robot) is ready to ship! A Robot Revolution
Consumer AI robot is here
Unitree B2-W is READY to ship!
- Price: $150,000
- Speed: 20 km/h
- Endurance: 50 km with a 40 kg load
- Power: Can pull up to 100 kgThis is a revolution. The future is NOW.
— el.cine (@EHuanglu)
8:43 AM • Dec 23, 2024
Run CTV Ads on Roku This Q5
“Q5” is a key post-holiday shopping period
Reach shoppers where they’re streaming – on Roku
You can run self-serve CTV ads for just $500
(✨ If you don’t want ads like these, Premium is the solution. It is like you are buying me a Starbucks Iced Honey Apple Almondmilk Flat White a month.)
Happy holidays and see you in 2025,
Martin 🙇
I recommend:
Beehiiv if you write newsletters.
Superhuman if you write a lot of emails.
Cursor if you code a lot.
Bolt.new for full-stack development.
Follow me on X.com.
AI for your org: We build custom AI solutions for half the market price and time (building with AI Agents). Contact us to know more.
You might like our last episodes:
The image displays a bar chart titled “Research Math (EpochAI Frontier Math)” that compares accuracy levels. The x-axis has two categories: “previous SoTA” (state-of-the-art) and “o3.” The y-axis represents accuracy, ranging from 0 to 100.
• The “previous SoTA” bar is low, reaching approximately 2.0% accuracy.
• The “o3” bar is significantly higher, reaching 25.2% accuracy.