- Generative AI - Short & Sweet
- Posts
- Google's open-source contribution underwhelms, Magic AI builds a super software engineer, V-JEPA, Groq and more
Google's open-source contribution underwhelms, Magic AI builds a super software engineer, V-JEPA, Groq and more
This week, we look at some GenAI basics, answer one of your questions, and review the week’s essential news.
What’s Behind the Computing Increase? + Groq’s Fastest AI
Magic AI Inc. is Building a Superhuman Software Engineer
Yann LeCun from Meta is Working on His AGI Version, and V-JEPA is the Next Step
Gemma, Google’s Open-Source Model Family, is Unfortunately Underwhelming
Question Feature: Prompt Engineering for AI Video Generation?
🙇 I know I promised a video version of the newsletter, but unfortunately, I can’t make that happen. The one thing I have to learn this year is focus. (This video by Jony Ive inspired me while I doom-scrolled. 😭) I apologize.
Reading time: 6 min
📗 Understanding GenAI - What’s Behind the Computing Increase? / Groq’s Fastest AI
(Source)
AI has seen remarkable growth due to several factors: increased computing power, greater data availability, more investment and talent, along with reduced data storage costs.
In "Generative AI - Navigating the Course to the AGI Future," I explore these factors, focusing on the exponential growth in computing capability, measured in floating-point operations per second (FLOPS), driven by hardware and software advancements.
Moore's law and its exponential growth.
Nvidia's ascent to a $2 trillion market valuation underscores the rapid progress in this domain. 🚀
On the hardware front, assembling numerous computing chips in a specific arrangement creates processing units such as CPUs and GPUs, as well as IPUs, TPUs, and NPUs. My book delves into their differences, functionalities, and future directions.
A major trend is the customization of processing units for specific organizational needs, exemplified by Apple's M series.
Today, Groq creates the world's fastest AI
This trend is accelerating. (The derivative of the derivative is positive. 🙃 )
Groq, the AI startup, is making waves with its innovative hardware, enabling ultra-fast LLM responses, as they have introduced an LPU (language processing unit) architecture.
Groq's public benchmarks showcase astonishing speeds of 500 tokens/sec, far surpassing the 50 tokens/sec for GPT 3.5, a tenfold increase for equivalent tasks.
Groq's secret sauce is this compiler-first method that shuns complexity.
It is safe to say
LLMs will be capable of responding to highly complex queries in a fraction of a second, scanning terabytes of complex relationships in data to provide immediate answers.
We are approaching a pivotal moment that surpasses Moore's law. We're not merely continuing its trajectory; we're on the verge of breakthroughs that will fundamentally transform our understanding of what is possible through innovation.
Read also about all things cloud computing, quantum computing, neuromorphic computing (so interesting 🧨 ), superconductors as potential next breakthroughs, and how software advances support AI breakthroughs. BUY NOW.
Also, Try out Groq here, using open-source LLMs like Mixtral or Llama.
🌱 What’s New in AI?
🧑🚀 Magic AI Inc. is Building a Superhuman Software Engineer + 4x Gemini Context Window
(Source)
Magic has announced a breakthrough in active AI reasoning comparable to OpenAI's Q* capabilities.
Their system insanely processes up to 3.5 million words, quadrupling Google's Gemini 1.5's 1M word context window!
This can be crucial for AI software development, enabling a necessary understanding of codebases, a task where current RAG architectures fall short due to their need to chunk information (and only include some sequential meaningful chunks).
Despite the potential, larger context windows, like Claude's 200k, show limitations, as illustrated below.
LLM accuracy based on context window length. (Take a moment to consume it.)
Optimizing long context windows for professional use is an ongoing challenge.
Nonetheless, Magic's innovation has garnered significant attention and funding, $117M, with notable investors like Nat Friedman (former GitHub CEO).
Groq is a grinder company (Chamath mentioned 4 major pivots), and I’ll be watching it this year.
🌍️ Yann LeCun from Meta is Working on His AGI Version, and V-JEPA is the Next Step
(Source)
Meta has publicly released the video joint embedding predictive architecture (V-JEPA) model, which is the next step in machine intelligence (AMI) to achieve a more comprehensive understanding of the world.
What distinguishes AMI from AGI?
AMI represents the current state of AI, specializing in specific tasks, while AGI requires sophisticated cognitive capabilities and remains an objective for the future. Thus, the vision of AMI = AGI.
V-JEPA is engineered to predict missing areas in spatially and temporally masked videos within a learned representational space.
First, an AI needs to learn to fill gaps before it generates freely.
The objective is to identify semantic representations within video data.
The JEPA models by Meta might be the breakthrough we need to achieve AGI. (I’ll give it a 76% chance). Learn more about it my book.
👪️ Gemma, Google’s Open-Source Model Family, is Unfortunately Underwhelming
(Source)
Meta's extensive open-source strategy has compelled AI leaders like Google to release open-source models.
Google has now semi-voluntarily released the Gemma AI Model, for which they deserve credit.
They launched it with significant fanfare, showcasing impressive benchmark results that outshine Llama-2.
Yet, in practical applications, Gemma performs disappointingly, as evidenced by Matthew Berman's test video.
👑 Q+A
I receive several questions weekly. A frequently asked one stands out, so I'm featuring it here.
How does prompt engineering play a role in creating complex narratives or scenes with Sora (AI for video generation)?
Prompt Engineering is crucial, especially when the models have a higher chaos factor. Sora, for instance, is one model that incorporates all the physics and associations of a scene and can be perceived as more chaotic (with little determinism).
Having a prompting framework helps a lot. For video generation, the prompting pattern V-I-S-I-O-N is a good starting point. It goes like this:
Vignette (Describe a Scene):
Example: Dawn at a medieval marketplace, stalls bustling with activity under the early light.
Interactions (Specify Dynamics):
Example: Merchants energetically sell wares to villagers, children dart around, and performers entertain.
Sentiment (Convey Emotion):
Example: A lively, communal buzz fills the air, signaling a day of prosperity.
Inciting Event (Introduce Change):
Example: A carriage rushes in, causing commotion as people and goods scatter.
Objective (State Purpose):
Example: The tale of a stranger begins whose arrival alters the town's course.
Nuance (Add Detail):
Example: A passing carriage's emblem and a hooded figure's gaze with an artisan convey a hidden past.
Do you have a question for me? Write me!
I am in London this week at the Generative AI for Marketing Summit. I brought my mother along. (Find her in the picture below.)
Would you like to learn about the Top X (10?) GenAI Tools for Marketers? |
Was the show a delight, or just alright?(You can comment after voting) |
Thank you so much for reading.
Martin