Generative AI - Short & Sweet
Posts
Midjourney V6 , Google's VideoPoet, HeyGen, Top5 Tools of 2023, GenAI Projections 2024

Midjourney V6 , Google's VideoPoet, HeyGen, Top5 Tools of 2023, GenAI Projections 2024

Martin Musiol
January 06, 2024 • Estimated Reading Time: 11 minutes

Generative AI - Short & Sweet

Newsletter Sign Up | GenAI Course | Become Sponsor?

We wish you a fantastic start in the New Year! 🎆🎇

GenerativeAI.net hopes you could recharge your batteries and spend quality time with your loved ones. We certainly could. ❣️

Before we start, I would like to share two quick updates. First, you may have noticed the emails' slightly different look and the adjusted sending email. As part of our New Year’s resolutions regarding this newsletter, we made it our top priority to deliver the highest quality content (in a fresh new look) with increased frequency starting in the next few weeks, and we hope you enjoy this change.

Beyond mere summaries (as 99% of newsletters do), our newsletter offers in-depth research, relevant insights, and evidence-based projections. For this, we have hired our competent AI Research Assistant, Himani, to keep up with that promise.

Second, I quit my job as the Generative AI Lead for EMEA at Infosys to be able to focus entirely on both AI education and building impactful products - more to come on this.

In this episode, you will find the latest Top GenAI News, the top 5 tools from 2023, and an evidence-based projection of what to expect in 2024.

It is still day 1 for AI and civilization.

Martin

🌱 Top GenAI News

Midjourney V6 is Here with In-image Text and Completely Overhauled Prompting

(Source)

Midjourney V6 is the latest version of the popular AI image generation model, and it comes with significant improvements, including the ability to render legible text within images and a completely overhauled prompting system. The new prompting methods encourage users to relearn how to prompt, as the process significantly differs from the previous version. Users can now generate images by typing specific text descriptions and keywords into the Discord server or the alpha version of the website. The update also includes minor text drawing ability and improved upscalers (when you like a generated image you can use Midjourney’s upscaler to increase resolution). Overall, Midjourney V6 offers drastically improved and more realistic, highly detailed images than its predecessors.

Seeing early V6-images not only elevated our heart rates but also urged us to try it out ourselves. The almost perfect result:

The Witcher (Henry Cavill) holding up the sign “Generative AI Short and Sweet“. However, the “AI“ is missing.

Midjourney keeps improving, with an impressive product velocity. Here a comparison of the versions: V1-V6, taking roughly the same prompt into account.

Midjourney version outputs compared: V1, V2, V3, V4, V5, V5.1, V5.2, V6

The CEO’s perspective on the future

High product velocity, combined with the ambition of CEO David Holz, is promising continued leaps. Holz recently stated during office hours, "We can get the holodeck by 2024." He enthusiastically said, "We’re gonna build a lot of stuff this year. I think we’ll build more stuff than we’ve ever built before…By the end of 2024 hopefully, we have real-time open worlds."

Further, Holz elaborated, "Midjourney isn’t a fast artist; it’s more like a slow game engine. The future isn’t one image a minute but 60fps (frames-per-second) in full volumetric 3D.”

The not-so-far-future could generate entire worlds with a single prompt. Is this achievable this year? Combining such technology with the new Apple Vision Pro suggests an exciting, adventurous future. What do you think?

Google’s VideoPoet: A large language model for zero-shot video generation

(Source)

VideoPoet is a large language model (LLM) developed by Google that can synthesize high-quality videos from various inputs such as text, images, videos, and audio. It uses a decoder-only transformer architecture to process these multimodal inputs. VideoPoet follows a typical LLM training protocol with pretraining and task-specific adaptation stages.

It excels in diverse video generation tasks like text-to-video, image-to-video, video stylization, inpainting, outpainting, and converting video to audio. As an autoregressive model, it generates output based on its previous creations, enabling it to produce more prominent and more consistent motion in longer videos, simulate different camera movements and visual styles, and generate matching audio for video clips.

What distinguishes VideoPoet is its integrated approach. Unlike other models with separate components for different tasks, VideoPoet combines all functionalities into one LLM, enhancing its versatility in video generation. This is the evolution of LLMs.

See Google’s raccoon story that shows the video quality:

Google is coming out swinging in the new year with its latest AI model: VideoPoet.
VideoPoet is capable of a number of different video-related generation tasks, including text-to-video, image-to-video, and even video-to-audio.
It can create videos like this:
— Product Hunt 😸 (@ProductHunt)
11:00 AM • Jan 3, 2024

HeyGen’s Indistinguishable Video Manipulation

(Source)

HeyGen AI is a GenAI video platform that simplifies video creation for business marketing, social media content, or personal projects. It offers customizable AI avatars, a range of voices, and the ability to convert scripts into talking videos - similar to Synthesia.

However, it also possesses the capability to manipulate existing videos after just 30 seconds of training. The tweet below showcases its impressive results in various languages. 😲

The audio in this video is AI generated. I never spoke these words.
#AI@HeyGen_Official
— ALan LepofskY 🇨🇦 🇺🇸 ✡️ (@alanlepo)
1:51 AM • Jan 6, 2024

You can begin today by uploading a video of yourself and scaling your internet presence across all social media platforms, covering a wide range of topics, provided you have the script. This contributes to the evolving future of AI influencers a massive trend in 2024, a topic we will delve into later.

Like the content? You will love the book “Generative AI: Navigating the Course to the Artificial General Intelligence Future”

This book offers a clear, insightful look into the history of generative AI, its current achievements, and its thrilling future potential, including the emergence of AGI. Delve into ethical considerations, the role of AI in daily life, and bold predictions for AI's impact on business and society.

www.amazon.co.uk/Generative-AI-Navigating-Artificial-Intelligence/dp/1394205910

🔝 Our Top 5 AI Tools of 2023 for Productivity

This section highlights the "Top 5 AI Tools" – a curated list that's more than just a ranking. These tools are proven to boost productivity for you and your business significantly. Of course, there are thousands of options, but our selection stands out for its immediate impact and real-world effectiveness if applicable and not applied yet.

Please let us know if you believe there's a tool that's essential for achieving productivity gain by sending us an email. We are happy to review it.

Perplexity AI for researching and distilling information

If you're deeply engaged in research or need as near-ground truth information as possible, we recommend Perplexity AI for superior research enhancement. It significantly improved the research quality and depth of Martin’s upcoming book on Generative AI.
Sign up and get a 10 $ discount.

https://perplexity.ai/pro?referral_code=L0QX8NMB

ChatGPT+, the all-rounder

For enhancing all(!) your writing or having a sparring partner on nearly any topic, ChatGPT is the right tool for you.
Additionally, by mastering prompt engineering, you can use ChatGPT in myriad ways: simulating personas, aiding in mental well-being, building training regimes, data analysis, and more - also, using its numerous plugins opens up a whole world of tools.
With DALLE-3, generate high-quality images instantly.
A standout feature is photographing an item and asking questions about it. For example, I took a photo of my medicine, unsure of its details or usage. My voice request was answered in just 8 seconds, revolutionizing how we understand the world!

chat.openai.com

Loom to upgrade communication with peers

Generative AI extends beyond just singular models like large language models generating text. As the technology evolves, we're witnessing applications like Loom adopting generative AI to revolutionize teaching and peer communication. Loom has seamlessly integrated AI into its tech stack, not only removing "umms", stop words, and silences from screen recordings but also extracting and generating follow-up tasks from voiceovers. This integration transforms Loom into a product that significantly enhances communication efficiency within companies.

www.loom.com/looms/videos

Durable AI, from idea to website in seconds

Durable.co is a platform that offers an AI website builder and small business software. It allows users to publish professional websites effortlessly without coding skills. The platform also provides built-in SEO, marketing tools, and review automation to help businesses grow their online presence.

ChatBotKit to build your GPT-chatbot in minutes

ChatBotKit enables the creation of conversational AI chatbots, tailored with custom data and designed to interact naturally with users across various platforms like websites, Slack, Discord, WhatsApp, and more. From my consulting experience, I've observed that this is what 95% of companies aim for and prioritize internally. They should be aware that such a solution is just a few clicks away.

chatbotkit.com

⏩️ What to Expect in 2024

Robotics Will Take Off

In robotics, developments suggest that the trend will continue with significant advancements in 2024.

Jim Fan, Senior Researcher at NVIDIA, states, "We are approximately three years away from a ChatGPT-like moment for physical AI agents."

Key developments in 2023 have laid the groundwork for future robotic platforms and models:

Multimodal Large Language Models (LLMs) with robot arms as physical input/output devices, such as VIMA, PerAct, RvT (NVIDIA), RT-1, RT-2, AutoRT, PaLM-E (Google), RoboCat (DeepMind), and Octo (Berkeley, Stanford, CMU).
Algorithms are bridging System 2 high-level reasoning (LLMs) and System 1 low-level control, including Eureka (NVIDIA) and Code as Policies (Google).
Significant progress in robust hardware development, featuring Tesla Optimus, Figure, 1X, Apptronik, Sanctuary, Agility+Amazon, Unitree, and others.
The research community addresses the longstanding challenge of data scarcity in robotics by curating resources like the Open X-Embodiment (RT-X) dataset. Although not yet diverse enough, this represents a significant step forward.
Simulation and synthetic data are increasingly important for enhancing robot dexterity and computer vision. Key contributions include:
- NVIDIA Isaac can simulate reality 1000 times faster than in real-time with scalable data streams.
- Hardware-accelerated ray tracing enables photorealism, accompanied by free ground truth annotations like segmentation, depth, and 3D pose.
- Using simulators to augment real-world data, thus creating larger datasets and reducing the need for costly human demonstrations, as exemplified by NVIDIA's MimicGen.

The field has been historically hindered by Moravec's paradox, the counterintuitive phenomenon where tasks simple for humans are challenging for AI, and vice versa. This year, we break free of it.

Ameca by Engineered Arts.

Robotics Companies (and robots) to watch out for

In 2024, several companies are leading the way in developing robots. The following companies are building humanoid robots.

Boston Dynamics: Atlas is an advanced humanoid robot capable of parkour and dancing.
Hanson Robotics: Sophia is a humanoid robot with emotional interaction capabilities.
UBTech: Walker X is a notable humanoid robot.
Engineered Arts: Ameca is designed for emotive interactions.
Agility Robotics: Digit is a humanoid robot for dynamic movement and complex navigation.
Tesla: Optimus is a prototype humanoid robot for repetitive and risky tasks.
Shadow Robot Company: It creates advanced humanoid robots, and its clients include NASA and Qualcomm.
Macco Robotics: Develops user-friendly robots for the leisure and hospitality sectors.
Xiaomi: CyberOne is an early-stage humanoid robot.

OpenAI’s GPT Store Launches Next Week

On the GPT Store users can sell and share AI agents.

Why does this matter?

This move democratizes AI by enabling anyone to create, share, and monetize custom AI agents without coding skills. Creators will earn based on their AI agents' usage, opening new monetization avenues.

Hey there, would you be interested in a GPT Builder Course by Generative AI?

(You can add free text when responding.)

The store will offer diverse AI models for specific purposes, potentially impacting various industries by providing customized AI solutions.

However, challenges exist, including an unclear monetization strategy, which may deter users due to the availability of free alternatives and the uncertainty of value in individual GPTs. The GPT Store also lacks a robust developer ecosystem and accessibility - but this might change quickly.

AI Influencers & AI-Generated News and Content

A significant trend emerging in 2024, still in its early stages, is the rise of AI influencers and AI-generated content, as highlighted in this tweet. (Emm guides you to build your own quickly.)

2024 will be the year of AI #influencers... Fully generated characters that can engage, entertain, and earn!
They're not just a concept anymore; some are already seeing significant revenue.
You can create one today. Learn how to build it from A to Z, including all aspects of… twitter.com/i/web/status/1…
— Emm (@emmanuel_2m)
7:50 AM • Jan 5, 2024

However, this trend is not without concerns, particularly regarding the potential for misuse, as it allows for easy and scalable distribution of not only news but misinformation and propaganda. Studies reveal that people tend to favor AI-produced content when unaware of its source. This even persists even after they learn about the AI's origin.

AI tools such as Jasper.ai, Copy.ai, Canva, InVideo, and Synthesia are increasingly employed for various content creation tasks.

However, the best results are likely achieved when AI is used to augment, rather than replace, the skills and expertise of content creators, as practiced at GenerativeAI.net. As observed in 2023, the responsibility on everyone to verify the credibility of the content they consume is even more critical this year!