Eerily realistic: Microsoft’s new AI model makes images talk, sing

Microsoft has developed an artificial intelligence model that converts images of a person’s face and audio clips into a video with proper lip-syncing, facial expressions, and head movements. Developed by a team of AI researchers at Microsoft Research Asia, the new AI model is called VASA-1.

“We introduce VASA, a framework for generating lifelike talking faces with appealing visual affective skills (VAS) given a single static image and a speech audio clip,” said the team in a research paper. “Our premiere model, VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio, but also capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness. “

To read more, click here.