My effective, simple RAG - no cost access

underrated o3-mini-high, and massive video gen progress

I start liking to vibe code + live streaming it. Lately, I built an open-source RAG from scratch. Live. Streaming on X.

Read today as well:

  • o3-mini-high is underrated - lowest hallucination + real-time conversation

  • So much progress on the video gen front

Hire an AI BDR & Get Qualified Meetings On Autopilot

Outbound requires hours of manual work.

Hire Ava who automates your entire outbound demand generation process, including:

  • Intent-Driven Lead Discovery Across Dozens of Sources

  • High Quality Emails with Human-Level Personalization

  • Follow-Up Management

  • Email Deliverability Management

( If you don’t want ads like these, Premium is the solution. It is like you are buying me a Starbucks Iced Honey Apple Almondmilk Flat White a month.)

99% of all AI solutions are running on RAGs - I built my own from scratch

What is a RAG (Retrieval-Augmented Generation)? It integrates a document retriever with an LLM. The user's query triggers retrieval of indexed documents (called semantic search), which the LLM then uses to generate fact-based responses.

The data preparation steps of an RAG:

  1. Text layer extraction of various documents like pdf, docx, videos, etc.

  2. Chunking of the extracted text (chunks should have some overlap)

  3. Index all documents (you want to ask queries against) with vector embeddings

The function steps of a RAG are:

  1. Embed the (user) query into the same vector space as of the document chunks

  2. Retrieve top relevant document chunks via a similarity search (→ cos similarity)

  3. Combine query and documents as context → Prompt Engineering

  4. Generate the answer using an LLM conditioned on that context

Why embedding into a vector space? Because then we only calculate the distance between chunks and query (algorithmically fast) vs. comparing word for word etc. 1. map chunks into vectors (fast) 2. calc distance (fast) 3. map vectors back to chunks (fast) -> 100k pages can be scanned in real time.

So, I built it from scratch, and plan to implement a couple more things to make it a top-notch RAG system easy to implement.

This is the REPOSITORY (I named it musiol-rag with my surname haha.. hopefully you don’t find it too narcissistic, but every other meaningful name was taken).

It is cost-free RAG for everyone to use; under the most permissive MIT license.

If you have feedback/ improvements for it, please open an issue on GitHub.

I streamed it earlier as well, which I plan to do more often now. Live-streaming my coding journey.

ONE MORE THING: I share my learnings with you, and keep you updated on AI/ how to build best with AI, as new tools/ AI will be launched.

I am launching the updated online course: Everyone can Code! (Updated, because it keeps valuable lessons on working with AI, but has a chapter that I update constantly.)

🚨Special DEAL: Get a yearly Premium Subscription to this newsletter and receive the course Everyone can Code!for free!🚨 (Together worth 200 €)

(You will be emailed with the course entry. Current Premium subscribers will get access to the course automatically.)

o3-mini-high is underrated - lowest hallucination + real-time conversation

I work and speak daily with OpenAI’s o3-mini-high model. Why? Because its answers are premium.

It can perform internet searches (and, outside EU, even Deep Research). It is a strong reasoning model that you can even talk to in real time. Yes, a reasoning model that you can converse with in real-time without delay.

o3-mini-high only has delays when it needs to research something online. But that is understandable. We can not expect miracles.

THE BEST THING:

o3-mini-high has a hallucination rate of 0.8 percent, according to the HuggingFace Leaderboard; the first AI to go under one percent!

(But Google also just in time released Gemini 2.0 Flash and Gemini 2.0 Pro which are below 1% as well. They are not in the graphic.)

**Alt Text:** Bar chart titled *"Hallucination Rate for Top 25 LLMs"*, displaying the hallucination rates of 25 large language models (LLMs). The chart is color-coded from blue (lower hallucination rates) to red (higher rates) and includes the *Vectara* logo in the top right corner. The x-axis represents the hallucination rate (from 0% to 5%), and the y-axis lists the LLMs in descending order of accuracy. - **Lowest hallucination rates:** - **OpenAI-o3-mini-high-reasoning** (0.8%) - **Zhipu AI GLM-4-9B-Chat** (1.3%) - **Google Gemini-2.0-Flash-Exp** (1.3%) - **OpenAI-o1-mini** (1.4%) - **GPT-4o** (1.5%) - **Mid-range hallucination rates:** - **GPT-4o-mini** (1.7%) - **GPT-4-Turbo** (1.7%) - **Google Gemini-2.0-Flash-Thinking-Exp** (1.8%) - **GPT-4** (1.8%) - **GPT-3.5-Turbo** (1.9%) - **DeepSeek-V2.5** (2.4%) - **OpenAI-o1** (2.4%) - **Microsoft Orca-2-13b** (2.5%) - **Microsoft Phi-3.5-MoE-instruct** (2.5%) - **Intel Neural-Chat-7B-v3-3** (2.6%) - **Higher hallucination rates:** - **Qwen2.5-7B-Instruct** (2.8%) - **AI21 Jamba-1.5-Mini** (2.9%) - **Snowflake-Arctic-Instruct** (3.0%) - **Qwen2.5-32B-Instruct** (3.0%) - **Microsoft Phi-3-mini-128k-instruct** (3.1%) - **Mistral Small3** (3.1%) - **OpenAI-o1-preview** (3.3%) - **Google Gemini-1.5-Flash-002** (3.4%) - **01-AI Yi-1.5-34B-Chat** (3.7%) - **Llama-3.1-405B-Instruct** (3.9%) - **DeepSeek-V3** (3.9%) – *highest hallucination rate on the list.* The chart was last updated on **January 31st, 2025**. The bars are arranged from lowest to highest hallucination rate, with smoother-performing models in cool colors and less reliable models in warm/red hues.

The trend is clear: while we might never get hallucination fully out of the models, AI model hallucinations asymptotically go to zero.

Find the HHEM Leaderboard here. (Bookmark it ideally.)

So much progress on the video gen front

I have access to Sora v2 by OpenAI and Kling, and a couple other.

While I think it is remarkable, it still is VERY clear that these are AI-generated videos. Sometimes people have 3 legs, sometimes the faces are distorted, and movements are not that natural.

However, there is an incredible amount of progress in the space which gets me thinking how long until we have near-perfect real videos.

Here are the (only) top 3 video gen launches of this week.

  1. ByteDance’s OmniHuman-1: one image transforms into a full video + sound 🔥

Incredible!

  1. Pika just dropped Pikadditions

  1. Meta released VideoJAM; clearly better than Sora v2 💣

BTW, where is Midjourney’s progress?! I need to cancel that subscription. It doesn’t feel that they are progressing much for a long time.

I hope you enjoyed it.

Happy weekend!
Martin 🙇

Our webpage