- The requirement for text-to-video generators at the level of today’s text-to-image options is high, courtesy the need for video content for various platforms.
Mark Zuckerberg, CEO of Meta, took on his Facebook page to announce Make-A-Video, a new Artificial Intelligence (AI) system that enables users to transform text suggestions, such as “a teddy bear painting a self-portrait,” into short, high-quality, unique video clips.
Sound like DALL-E? This is the concept: According to a news release, Make-A-Video advances AI picture production technology (including Meta’s Make-A-Scene work from earlier this year) by “adding a layer of unsupervised learning that enables the system to grasp motion in the real world and apply it to standard text-to-image generation.”
Zuckerberg stated, “This is pretty amazing progress. It’s much harder to generate video than photos because beyond correctly generating each pixel, the system also has to predict how they’ll change over time.”
A year after DALL-E
It’s hard to believe that it’s only been about a year since the original DALL-E was introduced in January 2021, while 2022 has seemed to be the year of the text-to-image revolution, thanks to DALL-E 2, Midjourney, Stable Diffusion, and other large generative models that enable users to create realistic images and art from natural text prompts.
Is Meta’s new Make-A-Video an indication that text-to-video, the next stage of generative AI, is set to enter the mainstream? Given the rapid advancement of text-to-picture this year — Midjourney even sparked controversy with its image that won an art contest at the Colorado State Fair — it is probably plausible. A few weeks ago, Runway, a video editing software developer, produced a promotional film previewing a new feature of their AI-powered web-based video editor that can modify a video from textual descriptions.
The requirement for text-to-video generators at the level of today’s text-to-image options is high, courtesy the need for video content for various platforms, including social media advertising, video blogs, and explainer videos.
On its part, Meta appears confident. Its study report presents Make-A-Video: “We provide state-of-the-art findings in all areas of text-to-video creation, including spatial and temporal revolution, text fidelity, and quality, as judged by both qualitative and quantitative measurements.”