- Generative AI - Short & Sweet
- Posts
- AI Video Generation Solved By OpenAI - What are its applications? 1M token with Gemini, Aya, and more
AI Video Generation Solved By OpenAI - What are its applications? 1M token with Gemini, Aya, and more
Cuckoo, OpenAI launched an AI rocket with Sora (some speak of AGI). Google decided to dwarf everyone else’s context window by increasing its own to 1 million 🤷. Aya, the LLM speaking 101 languages, and the reasons behind Sam’s $7T bet.
Transform Security from a Cost Center to a Revenue Enhancer
Sora - OpenAI’s Disruption in AI Video Generation
Gemini Pro 1.5 with 1 Million Token Context Window
Reasons for Sam Altman’s 7 Trillion Bet
(+ Critique from Jensen)Cohere’s Aya supports 101 Languages
Reading time: take it easy; make it 7 minutes.
✋ Transform Security from a Cost Center to a Revenue Enhancer
Shift Left: How to Turn Security into Review
In the competitive landscape of software business, optimizing processes and leveraging efficiencies can make a significant difference in building a strong pipeline and closing revenue faster.
Download the ebook from Vanta to learn how to:
Apply the DevOps principles of “shifting left” to position security as a differentiator — instead of a hurdle
Center security in your sales conversations at every stage to proactively remove roadblocks to revenue
Invest in your security story by making it easy for buyers to access security-related information
Vanta helps SaaS businesses of all sizes manage risk and prove security in real time.
🌱 What’s New?
📼 Sora - OpenAI’s Disruption in AI Video Generation
(Source)
OpenAI has again created a massive shockwave through the AI world by announcing its video generation model, Sora.
What it can do
Sora can generate videos up to 60 seconds long with temporal and 3D consistency in Full HD resolution, 1920x1080 pixels (not in 4K yet), and supports various aspect ratios, e.g., from drone flights.
It can simulate simple actions as well as entire digital worlds.
This is the "holy shit" moment of AI 🤯
OpenAI just launched Sora, an AI that can create hyperrealistic videos from just text prompts.
It'll be nearly impossible to tell the difference between AI and real in 2024.
(THREAD 🧵) 1/13— Barsee 🐶 (@heyBarsee)
6:27 AM • Feb 16, 2024
Its future potential
Consider this: if you can generate 1-minute videos, you could produce a 2-hour movie. Here, prompt engineering would be crucial.
Currently, creating movies with special effects or sci-fi themes requires deterministic (explicitly coded) physics models for elements like hair and water, necessitating extensive rendering.
Sora and similar models, however, have learned these behaviors in a non-deterministic manner through numerous training epochs and data.
Getting physics and other simulations right might not be the only target, but optimizing for entertainment.
This suggests a future where a mix of deterministic models, rendering, and AI video generation models could provide maximum adaptability.
Sora has paved the way for applications
Such as
Personalized/interactive movies, e.g., experiencing Star Wars from Yoda's perspective and exploring his daily life.
Educational videos, like visualizing life during the California Gold Rush in photorealistic detail.
Transforming books into videos; in extended-length films or a series of TikTok-format videos.
Visualizing company training, documentation, and FAQs in video format, where concepts could be illustrated through motion designs with voiceovers.
Or just a Cyber-Dog.
The possibilities are endless, and I am eager to hear your thoughts on potential deployments. Feel free to email me; I'd happily share your ideas if you like.
Speculation
Many details about Sora remain undisclosed (typical for OpenAI). However, some speculate that the model was developed using synthetic data from Unreal Engine, described as a ‘simulation of many worlds, real or fantastical.’
Lastly, I assume that GPT-5 includes full Sora capability, and it would not surprise me if they’d released it in March, one year after GPT-4. 🤞
⛓️ Gemini Pro 1.5 with 1 Million Token Context Window
(Source)
Gemini 1.5 Pro is Google's latest model, with a huge advancement in context window size, capable of processing up to 1 million tokens. GPT-4 Turbo has a 128,000-token capacity, and Claude 2.1 has 200k.
The 1.5 Pro version outperforms the 1.0 Pro in 87% of the benchmarks and is comparable in performance to the 1.0 Ultra!
It is multimodal, and its token capacity can comprise an hour of video, 11 hours of audio, or 4-5 full books.
Tech background
The model incorporates sparsity in its parameters, meaning that only a subset of the model's weights is updated during each training step. This approach reduces the computational load, leading to faster training times and lower energy consumption.
However, comparative analyses suggest that LLMs with shorter context windows but augmented with retrieval mechanisms, like RAG (Retrieval Augmented Generation), can perform comparably to those with longer context windows.
While large context windows offer the potential for more coherent and contextually relevant outputs, they also introduce challenges in maintaining attention throughout all given context information.
Even though the context window generally might further increase, putting an emphasis on the relevant pieces of information through a retrieval algorithm will remain key in performant AI model solutions.
Relevant for you?
Would you like to know more about that? Write me an email. I would love to discuss with you how you can implement RAG in your business.
🕵️ Reasons for Sam Altman’s 7 Trillion Bet
(+ Critique from Jensen)
Sam Altman, the CEO of OpenAI, is seeking to raise $7 trillion (close to 10% of the global GDP) to reshape the global chip industry. Why? Here the key points:
Addressing Chip Shortages: These efforts are aimed at mitigating the global chip shortage.
Enhancing AI Capabilities: Not only aiming to improve GPU chip supply but also their capabilities for enhancing AI development.
Technological Vision: Altman has big plans for AI technology.
Strategic Partnerships: Potential investors include sovereign wealth funds, government entities, notably the United Arab Emirates (UAE), SoftBank CEO Masayoshi Son, and representatives from Taiwan Semiconductor Manufacturing Co. (TSMC), the world's largest dedicated independent semiconductor foundry.
Geopolitical Considerations: The initiative is also seen in the context of geopolitical rivalry, particularly between the United States and China.
Jensen Huang, the CEO of Nvidia, has expressed skepticism about Sam Altman's plan. Why Huang believes Altman's chip dreams are off the mark:
Overestimation of Costs: Huang believes AI development and infrastructure costs will be below $7 trillion, estimating global AI data center construction at $2 trillion over five years.
Current and Future AI Infrastructure: Huang highlights the existing trillion-dollar global data center base, which is expected to double in value in the next 4-5 years, a fraction of Altman's target.
Misconception About Buying Computers: Critiques the notion that purchasing more computers is the sole solution to AI development, emphasizing the role of efficiency and speed improvements in computing.
Nvidia's Position in the AI-Chip Market: Points out Nvidia's strong AI-chip market presence and plans to design data-center chips and AI processors, suggesting existing initiatives may reduce the need for Altman's $7 trillion investment.
The owner of the OpenAI fund, specifically the OpenAI Startup Fund, is Sam Altman. Ergo, he is the boss on how to proceed; he indeed can form strong alliances (e.g., with Microsoft), and he has the skill to raise money.
Where is it going? To be seen.
😶🌫️ Cohere’s AI Model Aya supports 101 Languages
(Source)
Cohere For AI recently launched Aya, an open-source LLM supporting 101 languages.
With approximately 7,000 languages worldwide, the internet and AI predominantly feature 6-7 major languages, leaving many unserved.
Aya aims to democratize AI by providing high-quality, multilingual models to a broader audience.
Key features of Aya include:
Extensive Language Support: Aya covers over 50 previously underserved languages, like Somali and Uzbek.
Open-Source and Collaborative: Aya, along with its dataset— the largest multilingual instruction fine-tuned dataset with 513 million prompts and completions, is available under the Apache 2.0 license.
High Performance: Aya outperforms other open-source, multilingual models, achieving 75% in human evaluations and 80-90% across benchmark tests.
Cultural and Linguistic Inclusivity: Aya addresses the gap in linguistic and cultural relevance in current AI models, aiming for global applicability.
The Aya project is a collaborative effort involving over 3,000 researchers from 119 countries.
Potential Applications and Impact
It preserves and represents languages and cultures at risk of being marginalized by AI technologies.
It reduces language barriers, promoting a more inclusive digital environment.
If you are interested, engage with the project via their website.
The "generative" part of the brain is what turns ideas and plans into actions, including spoken and written words.
But the hard part of intelligence is to come up with those ideas and plans.
In the brain, ideas are formed in the prefrontal cortex.
Turning them into actions is… twitter.com/i/web/status/1…— Yann LeCun (@ylecun)
11:05 PM • Feb 17, 2024
Not this time, as I am on skiing vacation (next week again, I promise).
🔥 See You Next Week
This is from our hotel room at 6:30 AM. Just beautiful.
Was the show a delight, or just alright?(You can comment after voting) |
Thank you so much for reading.
Martin