Chinese researchers have developed a way of turning photographs into lifelike videos. The new AI tool, EMO, or Emote Portrait Alive, can animate a single static photo into a video in which the subject speaks or sings.
Full spectrum of human expressions
The breakthrough comes from multinational tech company Alibaba and its Institute for Intelligent Computing, whose researchers have trained a diffusion model with over 250 hours of footage of talking heads, including from films, TV, musical performance and speeches. This approach helped to overcome one of the major challenges in this field, says lead research author, Linrui Tian, which is that “Capturing the full spectrum of human expressions and the uniqueness of individual facial styles has always been a hurdle.”
Instead of trying to recreate facial expressions and movement using 3D models or shape blending, EMO is able to directly synthesise audio waves into video frames, “eliminating the need for complex 3D models or facial landmarks,” Tian explains.
Singing and rapping
This generates animations that are much more realistic and appear natural with better fluidity and finer, bespoke details. Current test results and user studies show EMO consistently outstrips the competition on industry key criteria for video quality, identity preservation, and emotional plausibility. The AI can even produce singing videos, of any length, perfectly synced to a vocal track, including performances in complex vocal genres such as rapping.
“EMO proves it can create convincing speaking videos and even generate singing videos in various styles. This is a major step forward,” the research paper published on arXiv claims.
The growth of AI and its threat?
The announcement comes as other new AI tools rain upon us. OpenAI recently revealed “Sora” its latest text-to-video tool, one of the means used in a new Chinese 26-episode animation titled “Qianqiu Shisong” exploring traditional poetry and borrowing an “inkwash” look.
The Chinese government is pushing AI through various strategic initiatives designed to boost economic development through tech. The State-owned Assets Supervision and Administration Commission (SASAC) is urging all State-owned enterprises (SOEs), to explore and integrate AI across sectors.
As with any AI producing human-imitation, EMO presents ethical questions to be answered around the ownership and use of photos, ranging from concerns about deepfakes and identity theft to misinformation. Researchers are working to catch up with safeguarding methods to detect and brand AI-generated content. There is no denying that there are also exciting potential applications for the new tech, including the ability to animate historical figures, as well as highly personalised memorials or clips for entertainment.