Generative AI - Short & Sweet
Posts
My effective, simple RAG - no cost access

My effective, simple RAG - no cost access

underrated o3-mini-high, and massive video gen progress

Martin Musiol
February 07, 2025 • Estimated Reading Time: 8 minutes

I start liking to vibe code + live streaming it. Lately, I built an open-source RAG from scratch. Live. Streaming on X.

Read today as well:

o3-mini-high is underrated - lowest hallucination + real-time conversation
So much progress on the video gen front

Hire an AI BDR & Get Qualified Meetings On Autopilot

Outbound requires hours of manual work.

Hire Ava who automates your entire outbound demand generation process, including:

Intent-Driven Lead Discovery Across Dozens of Sources
High Quality Emails with Human-Level Personalization
Follow-Up Management
Email Deliverability Management

Hire Ava to slash costs & boost productivity.

(✨ If you don’t want ads like these, Premium is the solution. It is like you are buying me a Starbucks Iced Honey Apple Almondmilk Flat White a month.)

99% of all AI solutions are running on RAGs - I built my own from scratch

What is a RAG (Retrieval-Augmented Generation)? It integrates a document retriever with an LLM. The user's query triggers retrieval of indexed documents (called semantic search), which the LLM then uses to generate fact-based responses.

The data preparation steps of an RAG:

Text layer extraction of various documents like pdf, docx, videos, etc.
Chunking of the extracted text (chunks should have some overlap)
Index all documents (you want to ask queries against) with vector embeddings

The function steps of a RAG are:

Embed the (user) query into the same vector space as of the document chunks
Retrieve top relevant document chunks via a similarity search (→ cos similarity)
Combine query and documents as context → Prompt Engineering
Generate the answer using an LLM conditioned on that context

Why embedding into a vector space? Because then we only calculate the distance between chunks and query (algorithmically fast) vs. comparing word for word etc. 1. map chunks into vectors (fast) 2. calc distance (fast) 3. map vectors back to chunks (fast) -> 100k pages can be scanned in real time.

So, I built it from scratch, and plan to implement a couple more things to make it a top-notch RAG system easy to implement.

This is the REPOSITORY (I named it musiol-rag with my surname haha.. hopefully you don’t find it too narcissistic, but every other meaningful name was taken).

It is cost-free RAG for everyone to use; under the most permissive MIT license.

If you have feedback/ improvements for it, please open an issue on GitHub.

I streamed it earlier as well, which I plan to do more often now. Live-streaming my coding journey.

My first coding live stream -> building a RAG for real-time applications from scratch.
— Martin Musiol (@musiol_martin)
11:23 AM • Feb 7, 2025

ONE MORE THING: I share my learnings with you, and keep you updated on AI/ how to build best with AI, as new tools/ AI will be launched.

I am launching the updated online course: Everyone can Code!✨ (Updated, because it keeps valuable lessons on working with AI, but has a chapter that I update constantly.)

🚨Special DEAL: Get a yearly Premium Subscription to this newsletter and receive the course Everyone can Code!✨for free!🚨 (Together worth 200 €)

(You will be emailed with the course entry. Current Premium subscribers will get access to the course automatically.)

o3-mini-high is underrated - lowest hallucination + real-time conversation

I work and speak daily with OpenAI’s o3-mini-high model. Why? Because its answers are premium.

It can perform internet searches (and, outside EU, even Deep Research). It is a strong reasoning model that you can even talk to in real time. Yes, a reasoning model that you can converse with in real-time without delay.

o3-mini-high only has delays when it needs to research something online. But that is understandable. We can not expect miracles.

THE BEST THING:

o3-mini-high has a hallucination rate of 0.8 percent, according to the HuggingFace Leaderboard; the first AI to go under one percent!

(But Google also just in time released Gemini 2.0 Flash and Gemini 2.0 Pro which are below 1% as well. They are not in the graphic.)

The trend is clear: while we might never get hallucination fully out of the models, AI model hallucinations asymptotically go to zero.

Find the HHEM Leaderboard here. (Bookmark it ideally.)

So much progress on the video gen front

I have access to Sora v2 by OpenAI and Kling, and a couple other.

While I think it is remarkable, it still is VERY clear that these are AI-generated videos. Sometimes people have 3 legs, sometimes the faces are distorted, and movements are not that natural.

However, there is an incredible amount of progress in the space which gets me thinking how long until we have near-perfect real videos.

Here are the (only) top 3 video gen launches of this week.

ByteDance’s OmniHuman-1: one image transforms into a full video + sound 🔥

ByteDance is on 🔥 - shows OmniHuman-1, the craziest 1 image to human avatar model so far (likely won't be released 😭)
I've followed this field from Wav2Lip and this is by far the craziest results, significantly beating Hallo3 and EchoMimic.
Nothing comes close, wow
— Alex Volkov (Thursd/AI) (@altryne)
3:51 PM • Feb 4, 2025

Incredible!

Pika just dropped Pikadditions

Omg Pikadditions 🥹 incredible feature, add anyone or anything to any video ❤️🫶 here's a video of me playing with the feature and a few funny creatures! 😻🚀
Well done @pika_labs team! 👏👏
— Irina Manoila (@HappyKittenAI)
5:14 PM • Feb 6, 2025

Meta released VideoJAM; clearly better than Sora v2 💣

Goodbye Sora
Meta just dropped VideoJAM, and its INSANE
13 WILD examples so far (Don't miss the 5th one)
— Poonam Soni (@CodeByPoonam)
10:44 AM • Feb 6, 2025

BTW, where is Midjourney’s progress?! I need to cancel that subscription. It doesn’t feel that they are progressing much for a long time.

I hope you enjoyed it.

Happy weekend!
Martin 🙇

I recommend:
- Beehiiv if you write newsletters.
- Superhuman if you write a lot of emails.
- Cursor if you code a lot.
- Bolt.new for full-stack development.
- My book - Generative AI: Navigating the Course to AGI.
- Follow me on X.com.
AI for your org: We build custom AI solutions for half the market price and time (building with AI Agents). Contact us to know more.