Generative AI - Short & Sweet
Posts
The Easiest OpenAI Realtime API Integration You'll Ever See [demo]

The Easiest OpenAI Realtime API Integration You'll Ever See [demo]

Plus, Writer's GenAI Platform, Meta's Movie Gen, RAG 2.0.

Martin Musiol
October 08, 2024 • Estimated Reading Time: 8 minutes

In partnership with

A year ago, I met May Habib, CEO of Writer, at Europe's first Generative AI conference. Similar to Grammarly, Writer helped you to write better.

May and the team worked hard and evolved the product.

Today, Writer is a GenAI platform that integrates AI where it brings value in your organization, all within minutes.

Its four pillars:

Complete control of its own specialized LLMs, called Palmyra.
Your data is queryable via their knowledge graph (KG) in minutes. It's like a RAG system 2.0 that only requires access—no data prep needed. The 2.0 part? A knowledge graph (KG) understands entity relationships.
AI guardrails ensure legal compliance and brand rules.
AI Studio, Writer’s impossible feature. Integrate the AI that provides the best value to your organization into the products you use. If it’s Chrome, Figma, Google Docs, or what have you.

I recommend taking a look and getting a head start with Writer. What they say:

Writer RAG tool: build production-ready RAG apps in minutes

RAG in just a few lines of code? We’ve launched a predefined RAG tool on our developer platform, making it easy to bring your data into a Knowledge Graph and interact with it with AI. With a single API call, writer LLMs will intelligently call the RAG tool to chat with your data.

Integrated into Writer’s full-stack platform, it eliminates the need for complex vendor RAG setups, making it quick to build scalable, highly accurate AI workflows just by passing a graph ID of your data as a parameter to your RAG tool.

Learn more about our production ready RAG tooling here.

OpenAI’s Realtime API changes everything: In 5 min I demo how to implement it 📲

What is the Realtime API, you might ask. Well, it’s an API that facilitates a low-latency, multi-modal conversation with an AI.

Low-latency to be able to have a real-time conversation.
Multi-modal because it accepts text as well as speech as input. Soon, it’ll support video streaming (think Facetime).
It supports function calling, enabling applications to trigger actions or retrieve information dynamically during conversations—as an RAG system would. But it could also call another API (low latency required), such as weather or another AI, to perform an action.
It has multi-tech integration, as you don’t need to implement a speech-to-text (stt), communication, and a text-to-speech (tts) model anymore. Just that API!
Of course, it speaks all kinds of languages.

My 5-min demo (only 1 script is really needed)

Martin Musiol demonstrates an AI call center automation project, showcasing real-time API integration in a coding environment with ngrok session status and Tesla product details. The setup uses Node.js for handling incoming calls and dynamic responses in a BPO automation context.

Reply to this email to receive the script.

What will the impact of real-time AI APIs be?

Currently, high-quality AI calls cost a lot. In a matter of minutes, I’ve spent $10. But, they will be cheaper. Eventually, for free.

We also know: people don’t like calling, especially newer generations.

Every business that can be called will have AI doing the calls, as they will be able to take the calls at any point in time, they’ll be more persuasive/ successful in selling, and cheaper than human colleagues.

By now, we know that (AI) tech evolves like that.

AIs will call AIs for us to receive knowledge or negotiate prices. They’ll find mutually beneficial ground, without emotions being involved. The person that uses the better model as well as the better prompts, might end up more beneficial place than other people.

Not overnight but progressively, the next departments to be automated are

Customer Service/ Call Centers/ BPOs
Tech support
Outreach sales

AI executive assistants will emerge, doing all kinds of calls for us.

Healthcare will adopt telemedicine en masse.

It will be a major job disruption, leading to mass job losses. And some job creation. According to my research, there are now around 17 million people working in contact centers.¹ Unfortunately, only a few will take on new jobs created in this space. (I discussed this in-depth in my book.)

On the upside, the economy will be massively efficient, while the service will improve.

Calls can be made proactively to inform us about better options. The AI will know exactly what we said 18 months ago and what our intrinsic motivation might be.

I would love to hear from you: Which areas of your business would benefit most from AI-driven calls, and would you consider implementing them? Reply to this email.

Meta joins the video generation race

Meta, trained on 6,144 H100 GPUs, has announced its Movie Gen. It is high-quality and can be used for editing and video generation.

In their 90-page paper, they go very detailed about how they have done it.

Next-gen RAG (Retrieval-Augmented Generation)

RAG systems are the most adopted AI-supporting tech in 2024.

They perform a document retrieval based on vectors representing chunks of the data to generate an answer via AI.

Is taking a chunk and embedding it without further information the best way to vector-represent information? No.

Detailed diagram of Contextual Document Embeddings (CDE) system showing two stages: contextual batching, which groups documents based on shared context, and contextual embedding, which produces document embeddings by incorporating information from neighboring documents. The process improves the accuracy of neural retrieval models by leveraging corpus-level context during training and embedding.

This paper found that incorporating context from neighboring documents significantly improves document embeddings for neural search tasks. It outperforms traditional methods, especially in new data scenarios, without requiring complex optimization techniques.

I will use it in an upcoming project and share with you how it went. Stay tuned.

Martin Musiol, author and AI expert, reading his book 'Generative AI: Navigating the Course to the Artificial General Intelligence Future,' published by Wiley. The book explores the transformative role of AI in creative industries and business applications, offering insights into the future of generative technology.

Here, I am reading the chapter on AI-driven calls. 🫠

That’s a wrap! I hope you enjoyed it.

Martin

Want to AI-upgrade your customer service? Contact us.
Spread the word! Referral program.
Would you like to sponsor a post?
My book - Generative AI: Navigating the Course to AGI.
Generativeai.net

1 https://www.probecx.com/en-us/blog/top-contact-center-trends