The Rapping Mona Lisa? A new AI from Microsoft makes faces move in images
CNN in New York —
Thanks to Microsoft's latest artificial intelligence technology, the Mona Lisa
is now capable of more than just smiling.
Microsoft researchers unveiled a new artificial intelligence (AI) model last
week that can automatically produce a realistic-looking video of a person
speaking from a still photograph of their face and an audio clip of them
speaking. The videos feature captivating lip syncing and realistic head and
face motions. They can be created using photorealistic faces, cartoons, or
artwork.
Researchers demonstrated how they animated the Mona Lisa to
recite an amusing rap by actress Anne Hathaway in one demo film.
The VASA-1 AI model produces both amusing and rather startling results in terms
of realism. According to Microsoft, the technology may be applied to the
creation of virtual human companions, education, or "improving
accessibility for individuals with communication challenges." However,
it's also simple to understand how the tool could be misused and turned into an
identity theft tool.
Beyond just Microsoft, researchers are concerned that the proliferation of convincing AI-generated images, videos, and audio capabilities will give rise to new kinds of misinformation. Some are also concerned that the technology may further upend the creative sectors, such as advertising and movies.
Microsoft stated that it does not currently intend to make the VASA-1 model available to the general public. This action is comparable to how Microsoft partner OpenAI is addressing issues with Sora, an AI-generated video tool: Sora was first hinted at by OpenAI in February, but only a select group of professionals and cybersecurity experts have had access to it up until this point for testing.
Microsoft researchers stated in a blog post, "We are
opposed to any behaviour to create misleading or harmful contents of real
persons." However, they clarified that until they are "certain that
the technology will be used responsibly and in accordance with proper
regulations," the company has "no plans to release" the product
to the public.
Animating face
With the help of many films of people speaking, Microsoft's
new AI model was trained to identify various natural facial and head motions,
such as "lip motion, (non-lip) expression, eye gaze, and blinking, among
others," according to researchers. When VASA-1 animates a still shot, the
outcome is a more realistic video.
For instance, the sample video features a footage of a person who appears to be
playing video games and has furrowed brows and pursed lips.
Additionally, the AI tool can be programmed to generate a video in which the person is showing a particular mood or looking in a particular direction.
Even with close inspection, there are still telltale indicators of artificial
intelligence in the films, like sporadic blinking and heightened eyebrow
motions. However, Microsoft asserts that its approach "significantly
outperforms" other tools of a similar nature and "paves the way for
real-time engagements with lifelike avatars that emulate human conversational behaviours."
