Why people are worried about Microsoft's AI making photos come to life

Watch: Microsoft’s new ‘deepfake’ video generator in action

A new generative AI system from Microsoft has highlighted just how advanced deepfake technology is becoming: generating compelling video from a single image and audio clip.

The tool turns an image into a realistic video, along with convincing emotions and movements, such as raising eyebrows.

One demo shows the Mona Lisa coming to life and singing Lady Gaga’s Papparazzi. Microsoft says the system isn’t specifically trained to handle vocal audio, but it does. But the ability to generate video from a single image and audio file has alarmed some experts.

Microsoft has not yet revealed when the AI system will be released to the general public. Yahoo spoke to two AI and privacy experts about the risks of this type of technology.

What is important about this new technology?

The VASA system (which stands for ‘visual affective skill’) allows users to indicate where the fake person is looking and what emotions they are displaying on the screen. Microsoft says the technology paves the way for ‘real-time’ engagement with realistic talking avatars.

The demo can create realistic video from one image and an audio file (Microsoft)

Microsoft says: ‘Our premiere model, VASA-1, is capable not only of producing lip movements that are exquisitely synchronized with the audio, but also of capturing a wide spectrum of facial nuances and natural head movements that contribute to the perception of authenticity and liveliness. .’

Why are some people concerned?

Not everyone is charmed by the new system; one blog describes it as a “deepfake nightmare machine.” Microsoft has emphasized that the system is a demonstration and says there are currently no plans to release it as a product.

But while VASA-1 represents a step forward in animating people, the technology is not unique: audio start-up Eleven Labs allows users to create incredibly realistic audio doppelgängers of people, based on just 10 minutes of audio.

Eleven Labs’ technology was used to create a “deepfake” audio clip of Joe Biden by “training” a fake version on publicly available audio clips of the president, and then sending out a fake audio clip of Biden urging people not to vote. The incident, which saw a user banned from Eleven Labs, highlighted how such technology can easily be used to manipulate real events.

This illustration photo, taken on January 30, 2023, shows a phone screen with a statement from the head of security policy at META with a fake video (R) of Ukrainian President Volodymyr Zelensky calling on his soldiers to lay down their weapons shown in the background. in Washington, DC. Chatbots spreading falsehoods, apps generating fake porn and cloned voices defrauding multi-million dollar companies – governments are scrambling to regulate AI-powered deepfakes that are widely feared to be a super-spreader of disinformation. (Photo by OLIVIER DOULIERY / AFP) (Photo by OLIVIER DOULIERY/AFP via Getty Images) — Meta shows a deepfaked video of Ukrainian President Volodymyr Zelensky telling his soldiers to put down their weapons (Photo by OLIVIER DOULIERY / AFP)

In another incident, an employee of a multinational company paid out $25 million to fraudsters after a video call with several other staff members, with everyone being a deepfake. Deepfakes are becoming increasingly common online, with a Prolific survey finding that 51% of adults said they had come across deepfake videos on social media.

Simon Bain, CEO of OmniIndex, said: ‘Deepfake technology is on a mission to produce content that contains no clues or ‘identifiable artefacts’ that show it is fake. The recent VASA-1 demo is the latest development to provide a significant step towards this, and Microsoft’s accompanying statement ‘Risk and Responsible AI Considerations’ suggests this drive for perfection, saying:

“Currently, the videos generated by this method still contain identifiable artifacts, and the numerical analysis shows that there is still a gap to achieve the authenticity of real videos.”

“Personally, I find this very alarming, because we need these identifiable artifacts to prevent deepfakes from causing irreparable damage.

What are the telltale signs that you are looking at a deepfake?

Small signs like inconsistencies in skin texture and flickers in facial movements can give away that you’re looking at a deepfake, Bain says. But soon even those could disappear, he explains.

Bain says: Only these possible inconsistencies in skin texture and small flickers in facial movements can visually tell us something about the authenticity of a video. That way, when we watch politicians destroy their upcoming election chances, we know that it is actually them and not an AI deepfake.

‘This begs the question: why is deepfake technology seemingly determined to eliminate these and other visual cues rather than ensuring they persist? After all, what benefit can a truly lifelike and ‘real’ fake video have other than misleading people? In my opinion, a deepfake that is almost lifelike but clearly unrecognizable can have just as much social benefit as a deepfake that is impossible to identify as fake.”

What are tech companies doing about it?

Twenty of the world’s largest tech companies, including Meta, Google, Amazon, Microsoft and TikTok, signed a voluntary agreement earlier this year to work together to stop the spread of deepfakes around elections.

Nick Clegg, President of Global Affairs at Meta, said: “With so many major elections taking place this year, it is vital that we do what we can to stop people being misled by AI-generated content.

“This work is bigger than any one company and will require a tremendous effort from industry, government and civil society.”

But the broader impact of deepfakes is that soon no one will be able to trust anything online and companies will have to use other methods to “validate” videos, says Jamie Boote, associate principal consultant at Synopsys Software Integrity Group:

Boote said: “The threat posed by deepfakes is that they are a way to fool people into believing what they see and hear, sent through digital channels. Previously, it was difficult for attackers to spoof someone’s voice or likeness, and even more difficult to do so. with live video and audio. Now AI makes that possible in real time and we can no longer believe what’s on the screen.

“Deepfakes open a new avenue of attack against human users of IT systems or other non-digital systems such as the stock market. This means that video calls from the CEO or announcements from PR people can be spoofed to manipulate stock prices in external attacks or used by spearphishers to manipulate employees into divulging information, changing network settings or permissions, or downloading and transferring files. to open.

“To protect against this threat, we must learn to validate that the face on the screen is actually the face in front of the sender’s camera, and this can be done through additional channels, such as a call to the sender’s cell phone, a message from a trusted account, or for public announcements, a press release on a public site operated by the Company.

Why people are worried about Microsoft’s AI making photos come to life

What is important about this new technology?

Why are some people concerned?

What are the telltale signs that you are looking at a deepfake?

What are tech companies doing about it?

Leave a Comment Cancel reply