Midjourney V6 , Google's VideoPoet, HeyGen, Top5 Tools of 2023, GenAI Projections 2024

Generative AI - Short & Sweet

We wish you a fantastic start in the New Year! 🎆🎇 

GenerativeAI.net hopes you could recharge your batteries and spend quality time with your loved ones. We certainly could. ❣️ 

Before we start, I would like to share two quick updates. First, you may have noticed the emails' slightly different look and the adjusted sending email. As part of our New Year’s resolutions regarding this newsletter, we made it our top priority to deliver the highest quality content (in a fresh new look) with increased frequency starting in the next few weeks, and we hope you enjoy this change.

Beyond mere summaries (as 99% of newsletters do), our newsletter offers in-depth research, relevant insights, and evidence-based projections. For this, we have hired our competent AI Research Assistant, Himani, to keep up with that promise.

Second, I quit my job as the Generative AI Lead for EMEA at Infosys to be able to focus entirely on both AI education and building impactful products - more to come on this.

In this episode, you will find the latest Top GenAI News, the top 5 tools from 2023, and an evidence-based projection of what to expect in 2024.

It is still day 1 for AI and civilization.

Martin

🌱 Top GenAI News

Midjourney V6 is Here with In-image Text and Completely Overhauled Prompting

(Source)

Midjourney V6 is the latest version of the popular AI image generation model, and it comes with significant improvements, including the ability to render legible text within images and a completely overhauled prompting system. The new prompting methods encourage users to relearn how to prompt, as the process significantly differs from the previous version. Users can now generate images by typing specific text descriptions and keywords into the Discord server or the alpha version of the website. The update also includes minor text drawing ability and improved upscalers (when you like a generated image you can use Midjourney’s upscaler to increase resolution). Overall, Midjourney V6 offers drastically improved and more realistic, highly detailed images than its predecessors.

Seeing early V6-images not only elevated our heart rates but also urged us to try it out ourselves. The almost perfect result:

The Witcher (Henry Cavill) holding up the sign “Generative AI Short and Sweet“. However, the “AI“ is missing.

Midjourney keeps improving, with an impressive product velocity. Here a comparison of the versions: V1-V6, taking roughly the same prompt into account.

Midjourney version outputs compared: V1, V2, V3, V4, V5, V5.1, V5.2, V6

The CEO’s perspective on the future

High product velocity, combined with the ambition of CEO David Holz, is promising continued leaps. Holz recently stated during office hours, "We can get the holodeck by 2024." He enthusiastically said, "We’re gonna build a lot of stuff this year. I think we’ll build more stuff than we’ve ever built before…By the end of 2024 hopefully, we have real-time open worlds."

Further, Holz elaborated, "Midjourney isn’t a fast artist; it’s more like a slow game engine. The future isn’t one image a minute but 60fps (frames-per-second) in full volumetric 3D.”

The not-so-far-future could generate entire worlds with a single prompt. Is this achievable this year? Combining such technology with the new Apple Vision Pro suggests an exciting, adventurous future. What do you think?

Google’s VideoPoet: A large language model for zero-shot video generation

(Source)

VideoPoet is a large language model (LLM) developed by Google that can synthesize high-quality videos from various inputs such as text, images, videos, and audio. It uses a decoder-only transformer architecture to process these multimodal inputs. VideoPoet follows a typical LLM training protocol with pretraining and task-specific adaptation stages.

It excels in diverse video generation tasks like text-to-video, image-to-video, video stylization, inpainting, outpainting, and converting video to audio. As an autoregressive model, it generates output based on its previous creations, enabling it to produce more prominent and more consistent motion in longer videos, simulate different camera movements and visual styles, and generate matching audio for video clips.

What distinguishes VideoPoet is its integrated approach. Unlike other models with separate components for different tasks, VideoPoet combines all functionalities into one LLM, enhancing its versatility in video generation. This is the evolution of LLMs.

See Google’s raccoon story that shows the video quality:

HeyGen’s Indistinguishable Video Manipulation

(Source)

HeyGen AI is a GenAI video platform that simplifies video creation for business marketing, social media content, or personal projects. It offers customizable AI avatars, a range of voices, and the ability to convert scripts into talking videos - similar to Synthesia.

However, it also possesses the capability to manipulate existing videos after just 30 seconds of training. The tweet below showcases its impressive results in various languages. 😲 

You can begin today by uploading a video of yourself and scaling your internet presence across all social media platforms, covering a wide range of topics, provided you have the script. This contributes to the evolving future of AI influencers a massive trend in 2024, a topic we will delve into later.

🔝 Our Top 5 AI Tools of 2023 for Productivity

This section highlights the "Top 5 AI Tools" – a curated list that's more than just a ranking. These tools are proven to boost productivity for you and your business significantly. Of course, there are thousands of options, but our selection stands out for its immediate impact and real-world effectiveness if applicable and not applied yet.

Please let us know if you believe there's a tool that's essential for achieving productivity gain by sending us an email. We are happy to review it.

⏩️ What to Expect in 2024

Robotics Will Take Off

In robotics, developments suggest that the trend will continue with significant advancements in 2024.

Jim Fan, Senior Researcher at NVIDIA, states, "We are approximately three years away from a ChatGPT-like moment for physical AI agents."

Key developments in 2023 have laid the groundwork for future robotic platforms and models:

  1. Multimodal Large Language Models (LLMs) with robot arms as physical input/output devices, such as VIMA, PerAct, RvT (NVIDIA), RT-1, RT-2, AutoRT, PaLM-E (Google), RoboCat (DeepMind), and Octo (Berkeley, Stanford, CMU).

  2. Algorithms are bridging System 2 high-level reasoning (LLMs) and System 1 low-level control, including Eureka (NVIDIA) and Code as Policies (Google).

  3. Significant progress in robust hardware development, featuring Tesla Optimus, Figure, 1X, Apptronik, Sanctuary, Agility+Amazon, Unitree, and others.

  4. The research community addresses the longstanding challenge of data scarcity in robotics by curating resources like the Open X-Embodiment (RT-X) dataset. Although not yet diverse enough, this represents a significant step forward.

  5. Simulation and synthetic data are increasingly important for enhancing robot dexterity and computer vision. Key contributions include:

    • NVIDIA Isaac can simulate reality 1000 times faster than in real-time with scalable data streams.

    • Hardware-accelerated ray tracing enables photorealism, accompanied by free ground truth annotations like segmentation, depth, and 3D pose.

    • Using simulators to augment real-world data, thus creating larger datasets and reducing the need for costly human demonstrations, as exemplified by NVIDIA's MimicGen.

The field has been historically hindered by Moravec's paradox, the counterintuitive phenomenon where tasks simple for humans are challenging for AI, and vice versa. This year, we break free of it.

Ameca by Engineered Arts.

Robotics Companies (and robots) to watch out for

In 2024, several companies are leading the way in developing robots. The following companies are building humanoid robots.

  • Boston Dynamics: Atlas is an advanced humanoid robot capable of parkour and dancing.

  • Hanson Robotics: Sophia is a humanoid robot with emotional interaction capabilities.

  • UBTech: Walker X is a notable humanoid robot.

  • Engineered Arts: Ameca is designed for emotive interactions.

  • Agility Robotics: Digit is a humanoid robot for dynamic movement and complex navigation.

  • Tesla: Optimus is a prototype humanoid robot for repetitive and risky tasks.

  • Shadow Robot Company: It creates advanced humanoid robots, and its clients include NASA and Qualcomm.

  • Macco Robotics: Develops user-friendly robots for the leisure and hospitality sectors.

  • Xiaomi: CyberOne is an early-stage humanoid robot.

OpenAI’s GPT Store Launches Next Week

On the GPT Store users can sell and share AI agents.

Why does this matter?

This move democratizes AI by enabling anyone to create, share, and monetize custom AI agents without coding skills. Creators will earn based on their AI agents' usage, opening new monetization avenues.

Hey there, would you be interested in a GPT Builder Course by Generative AI?

(You can add free text when responding.)

Login or Subscribe to participate in polls.

The store will offer diverse AI models for specific purposes, potentially impacting various industries by providing customized AI solutions.

However, challenges exist, including an unclear monetization strategy, which may deter users due to the availability of free alternatives and the uncertainty of value in individual GPTs. The GPT Store also lacks a robust developer ecosystem and accessibility - but this might change quickly.

AI Influencers & AI-Generated News and Content

A significant trend emerging in 2024, still in its early stages, is the rise of AI influencers and AI-generated content, as highlighted in this tweet. (Emm guides you to build your own quickly.)

However, this trend is not without concerns, particularly regarding the potential for misuse, as it allows for easy and scalable distribution of not only news but misinformation and propaganda. Studies reveal that people tend to favor AI-produced content when unaware of its source. This even persists even after they learn about the AI's origin.

AI tools such as Jasper.ai, Copy.ai, Canva, InVideo, and Synthesia are increasingly employed for various content creation tasks.

However, the best results are likely achieved when AI is used to augment, rather than replace, the skills and expertise of content creators, as practiced at GenerativeAI.net. As observed in 2023, the responsibility on everyone to verify the credibility of the content they consume is even more critical this year!

Hey there, help us improve! Rate the latest episode of our newsletter.

(You can add a comment below once answered.)

Login or Subscribe to participate in polls.

Want to See the Episode as a Video?

Subscribing to, giving feedback about, and sharing the newsletter and our renowned online course will be highly appreciated and will help a lot. 🌝

Please reply if you want to sponsor an ad for this 34k+ newsletter.

Thank you so much for reading,


Martin and the GenerativeAI.net Team