Meta (formerly Facebook) has announced the debut of Make-a-Video, a new artificial intelligence (AI) system that lets people turn text prompts into brief, high-quality video clips. The program builds on the work begun last year as Make-a-Scene, an image generator by Meta AI that offers greater compositional control for images based on prompts, and on Meta’s ethos that “it’s not enough for an AI system to just generate content” — users should be able to “shape and control the content a system generates.”
The generator is not currently available for public use, but a white paper study on the research has been published, and Meta shared sample videos it said were created by the new technology, such as a clip of a “robot dancing in Times Square” and another of “a cat watching TV.”
Using lines of text — or even just a few words — Make-A-Video can create unique clips featuring vivid colors, original characters, and ambient landscapes. The system can also adapt existing images into videos, or create new videos modeled on existing ones.
Of course, technology like this runs certain risks — and not just injuries caused by AI artists falling all over themselves to create ever more visceral, many-eyed-corpse-baby art, or the possibility that once animated, the AI-generated Loab will be able to reach through the screen and strangle us. There should be real concern that in a media field already saturated by misinformation, AI-generated videos represent great potential to reinforce existing shortfalls in consensus on fact-based truth.
When asked about what Make-a-Video could mean for the creation of deepfakes, a Meta representative said, “There are risks that Make-A-Video could potentially be used to create mis/disinformation. Due to this risk, we have added a watermark to all content created from Make-A-Video to ensure viewers know the video was generated with AI.”
The company added that it was taking “a thoughtful, iterative approach before sharing a public demo,” including sharing the work with the research community for feedback.
This seems to be the very least we can hope for from Meta, a company that in its previous iteration as Facebook has been deeply problematic in its propagation of misinformation. Perhaps this seems ridiculous when looking at one of their sample videos of, say, a flying superhero dog, but more contentious with neutral enough examples like a paintbrush moving on canvas or a horse drinking water — and by the time we get to the clip of a spaceship landing, there is a feeling that it might be wise to invest in tinfoil for a coming run on the market to use it to make helmets.
Meta isn’t the only company making moves in the AI-gen space, with Imagen by Google able to achieve startling, photo-realistic results from text prompts. The Imagen team seems to be acutely aware of the potential pitfalls of AI imagery, with an explicit statement about “Limitations and Social Impact” on their research page.
“There are several ethical challenges facing text-to-image research broadly,” the statement reads. “First, downstream applications of text-to-image models are varied and may impact society in complex ways … Second, the data requirements of text-to-image models have led researchers to rely heavily on large, mostly uncurated, web-scraped datasets [that] often reflect social stereotypes, oppressive viewpoints, and derogatory, or otherwise harmful, associations to marginalized identity groups.”