- Generative AI - Short & Sweet
- Posts
- The Easiest OpenAI Realtime API Integration You'll Ever See [demo]
The Easiest OpenAI Realtime API Integration You'll Ever See [demo]
Plus, Writer's GenAI Platform, Meta's Movie Gen, RAG 2.0.
A year ago, I met May Habib, CEO of Writer, at Europe's first Generative AI conference. Similar to Grammarly, Writer helped you to write better.
May and the team worked hard and evolved the product.
Today, Writer is a GenAI platform that integrates AI where it brings value in your organization, all within minutes.
Its four pillars:
Complete control of its own specialized LLMs, called Palmyra.
Your data is queryable via their knowledge graph (KG) in minutes. It's like a RAG system 2.0 that only requires access—no data prep needed. The 2.0 part? A knowledge graph (KG) understands entity relationships.
AI guardrails ensure legal compliance and brand rules.
AI Studio, Writer’s impossible feature. Integrate the AI that provides the best value to your organization into the products you use. If it’s Chrome, Figma, Google Docs, or what have you.
I recommend taking a look and getting a head start with Writer. What they say:
Writer RAG tool: build production-ready RAG apps in minutes
RAG in just a few lines of code? We’ve launched a predefined RAG tool on our developer platform, making it easy to bring your data into a Knowledge Graph and interact with it with AI. With a single API call, writer LLMs will intelligently call the RAG tool to chat with your data.
Integrated into Writer’s full-stack platform, it eliminates the need for complex vendor RAG setups, making it quick to build scalable, highly accurate AI workflows just by passing a graph ID of your data as a parameter to your RAG tool.
OpenAI’s Realtime API changes everything: In 5 min I demo how to implement it 📲
What is the Realtime API, you might ask. Well, it’s an API that facilitates a low-latency, multi-modal conversation with an AI.
Low-latency to be able to have a real-time conversation.
Multi-modal because it accepts text as well as speech as input. Soon, it’ll support video streaming (think Facetime).
It supports function calling, enabling applications to trigger actions or retrieve information dynamically during conversations—as an RAG system would. But it could also call another API (low latency required), such as weather or another AI, to perform an action.
It has multi-tech integration, as you don’t need to implement a speech-to-text (stt), communication, and a text-to-speech (tts) model anymore. Just that API!
Of course, it speaks all kinds of languages.
My 5-min demo (only 1 script is really needed)
What will the impact of real-time AI APIs be?
Currently, high-quality AI calls cost a lot. In a matter of minutes, I’ve spent $10. But, they will be cheaper. Eventually, for free.
We also know: people don’t like calling, especially newer generations.
Every business that can be called will have AI doing the calls, as they will be able to take the calls at any point in time, they’ll be more persuasive/ successful in selling, and cheaper than human colleagues.
By now, we know that (AI) tech evolves like that.
AIs will call AIs for us to receive knowledge or negotiate prices. They’ll find mutually beneficial ground, without emotions being involved. The person that uses the better model as well as the better prompts, might end up more beneficial place than other people.
Not overnight but progressively, the next departments to be automated are
Customer Service/ Call Centers/ BPOs
Tech support
Outreach sales
AI executive assistants will emerge, doing all kinds of calls for us.
Healthcare will adopt telemedicine en masse.
It will be a major job disruption, leading to mass job losses. And some job creation. According to my research, there are now around 17 million people working in contact centers.1 Unfortunately, only a few will take on new jobs created in this space. (I discussed this in-depth in my book.)
On the upside, the economy will be massively efficient, while the service will improve.
Calls can be made proactively to inform us about better options. The AI will know exactly what we said 18 months ago and what our intrinsic motivation might be.
I would love to hear from you: Which areas of your business would benefit most from AI-driven calls, and would you consider implementing them? Reply to this email.
Meta joins the video generation race
Meta, trained on 6,144 H100 GPUs, has announced its Movie Gen. It is high-quality and can be used for editing and video generation.
In their 90-page paper, they go very detailed about how they have done it.
Next-gen RAG (Retrieval-Augmented Generation)
RAG systems are the most adopted AI-supporting tech in 2024.
They perform a document retrieval based on vectors representing chunks of the data to generate an answer via AI.
Is taking a chunk and embedding it without further information the best way to vector-represent information? No.
This paper found that incorporating context from neighboring documents significantly improves document embeddings for neural search tasks. It outperforms traditional methods, especially in new data scenarios, without requiring complex optimization techniques.
I will use it in an upcoming project and share with you how it went. Stay tuned.
Here, I am reading the chapter on AI-driven calls. 🫠
That’s a wrap! I hope you enjoyed it.
Martin
Want to AI-upgrade your customer service? Contact us.
Spread the word! Referral program.
Would you like to sponsor a post?