- Generative AI - Short & Sweet
- Posts
- 9 Areas Where Humans Still Outperform AI
9 Areas Where Humans Still Outperform AI
.. and big update from Mistral
AI hasn’t fully taken over yet. Humans still excel in certain areas - of course.
These 9 benchmarks outline vital skills and evaluate how AI measures against humans.
(Don’t miss out on the quick AI highlights at the bottom.)
✅ Before we started, we launched a premium version of the newsletter. Subscribing gives you 100% access to all content, exclusive demos, and an ad-free experience. I plan to host AMAs and develop more subscriber-requested demos.
Fully Automated Email Outreach With AI Agent Frank
Hire Agent Frank to join your sales team and let him take care of prospecting, emailing and booking meetings for you, so your team can focus on closing deals!
Agent Frank works in two modes - fully autonomous Auto-pilot and Co-pilot, where you can review and monitor his work. And he’s super easy to set up in just 4 quick steps!
He learns using first-party data you provide him during onboarding and continuously gets better as he works to book you more meetings 🚀
(✨ If you don’t want ads like these, Premium is the solution. It is like you are buying me a Starbucks Iced Honey Apple Almondmilk Flat White a month.)
9 areas where humans still have an edge compared to AI
It might feel that humans are losing their edge against AI systems that are increasingly better. What is the value that humans can capture versus air systems?
Turns out there are several fields.
In my research, I stumbled upon these 9 datasets/ evaluations that show that humans still have an incredible edge against AI systems - for some time.
What is WorkArena++?
These are 682 tasks that simulate workflows typical for knowledge workers, testing planning, problem-solving, reasoning, info retrieval, and context understanding. Humans outperform AI thanks to more robust reasoning and contextual grasp.
To have really powerful AI agents, we want them to excel on the benchmark. In 2025, we will see great progress here, where the average human might not have a competitive edge.
What is Simple-bench?
Multiple-choice tasks (200+ questions) test spatio-temporal reasoning, and social intelligence. High schoolers outperform state-of-the-art models, currently.
Humans will maintain a competitive edge for the foreseeable future.
What is ARC-AGI?
Assesses AI's ability to learn new skills and solve open-ended problems via patterns and abstract reasoning. Humans excel due to better generalization and abstract thinking.
A simple concept—covered in a past episode. Humans will likely outperform computers for another 2–3 years.
What is MiniWob?
Web-based tasks test reinforcement learning agents in navigation and interaction. Humans currently lead due to better understanding and adaptability. However, with AI gaining access to web pages via visual, textual, and API channels, the margin is narrowing quickly. By 2025, AI will match or surpass humans in these tasks, and I’m already taking over here.
What is WebArena?
Evaluates complex web tasks like info retrieval and form filling. The gap between average human capabilities and AI is shrinking rapidly, similar to MiniWob. While opportunities remain, for now, AI will soon close the gap entirely.
What is Putnam Bench?
Tests theorem-proving algorithms with problems from the Putnam Mathematical Competition. At the same time, the average human doesn’t have an edge, human experts (PhDs) excel. Interestingly, the AI-human baseline is often mislabeled. Starting next year, AI collaborators will reach parity with human PhDs, significantly accelerating scientific progress.
What is NOCHA?
Evaluates object classification and hierarchical annotation. Humans still outperform AI due to sharper visual perception and contextual understanding. Visual AI has evolved gradually over decades—from early convolutional neural networks to current LLM integrations. For at least the next year, AI won’t surpass the average human in these tasks.
What is GAIA?
Tests generalization across tasks and environments, especially for Internet research. Humans currently excel with natural adaptability. However, AI agents are likely to surpass the average human within 2–3 years. Progress depends not only on smarter AI but also on larger context windows, better comprehension, and improvements in model architecture.
What is Lab-Bench?
Focuses on biology-related lab tasks like experimental design and data analysis. Humans excel with expertise and intuition, but the role is shifting. In the coming years, scientists—biologists, chemists, and physicists—will evolve into research project managers, supported by teams of AI agents handling routine tasks.
Updates from Mistral - Pixtral Large (open source) & Le Chat
Pixtral Large: 124B Parameters of Power
This model is crushing benchmarks.
Top scores on MathVista, DocVQA, and VQAv2
Maintains the strong text skills of Mistral Large 2
Built with a 123B decoder + 1B vision encoder
128K token limit for long documents
Want it? It’s free to download on Hugging Face.
Le Chat Also Just Leveled Up
It now does:
Web search with sources cited for fact-checking
Canvas for brainstorming: Edit, export, create seamlessly
Vision upgrades: Reads images & documents
Flux Pro for stunning image generation
Speculative editing: Predicts & refines text faster than you
And yes, it’s still free. → Le Chat
NVIDIA ALCHEMI Accelerates Sustainable Materials Discovery for EV Batteries and Solar Panels
Ian Buck announces NVIDIA Alchemi, an AI-driven digital chemistry service to discover new compounds for medicine, chemistry and materials design, speeding up the discovery process by 100x
— Tsarathustra (@tsarnick)
8:24 PM • Nov 18, 2024
That’s a wrap! I hope you enjoyed it.
Martin
Follow me on X.com.
Do you write newsletters? I use Beehiiv and highly recommend it.
AI for your org: We build custom AI solutions half the market price, and time (building w/ AI Agents). Contact us to know more.
Would you like to sponsor a post?