The Rapping Mona Lisa A new AI from Microsoft makes faces move in images

 The Rapping Mona Lisa? A new AI from Microsoft makes faces move in images


CNN in New York —

Thanks to Microsoft's latest artificial intelligence technology, the Mona Lisa is now capable of more than just smiling.

Microsoft researchers unveiled a new artificial intelligence (AI) model last week that can automatically produce a realistic-looking video of a person speaking from a still photograph of their face and an audio clip of them speaking. The videos feature captivating lip syncing and realistic head and face motions. They can be created using photorealistic faces, cartoons, or artwork.

Researchers demonstrated how they animated the Mona Lisa to recite an amusing rap by actress Anne Hathaway in one demo film.

The VASA-1 AI model produces both amusing and rather startling results in terms of realism. According to Microsoft, the technology may be applied to the creation of virtual human companions, education, or "improving accessibility for individuals with communication challenges." However, it's also simple to understand how the tool could be misused and turned into an identity theft tool.

Beyond just Microsoft, researchers are concerned that the proliferation of convincing AI-generated images, videos, and audio capabilities will give rise to new kinds of misinformation. Some are also concerned that the technology may further upend the creative sectors, such as advertising and movies.

Microsoft stated that it does not currently intend to make the VASA-1 model available to the general public. This action is comparable to how Microsoft partner OpenAI is addressing issues with Sora, an AI-generated video tool: Sora was first hinted at by OpenAI in February, but only a select group of professionals and cybersecurity experts have had access to it up until this point for testing.

Microsoft researchers stated in a blog post, "We are opposed to any behaviour to create misleading or harmful contents of real persons." However, they clarified that until they are "certain that the technology will be used responsibly and in accordance with proper regulations," the company has "no plans to release" the product to the public.

 

Animating face

With the help of many films of people speaking, Microsoft's new AI model was trained to identify various natural facial and head motions, such as "lip motion, (non-lip) expression, eye gaze, and blinking, among others," according to researchers. When VASA-1 animates a still shot, the outcome is a more realistic video.

For instance, the sample video features a footage of a person who appears to be playing video games and has furrowed brows and pursed lips.

Additionally, the AI tool can be programmed to generate a video in which the person is showing a particular mood or looking in a particular direction.

Even with close inspection, there are still telltale indicators of artificial intelligence in the films, like sporadic blinking and heightened eyebrow motions. However, Microsoft asserts that its approach "significantly outperforms" other tools of a similar nature and "paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviours."

Comments