The Age of Digital Deception: The Dangers of Deep Fakes

In the ever-evolving landscape of technology, as it advances more and more, so too do the capabilities of those seeking to exploit it. Recent years have birthed a dangerous usage of the technology offered by neural networks, the deep fakes.

Deep fakes have emerged as a controversial and concerning phenomenon. These are models that utilize artificial intelligence algorithms to generate realistic-looking content that can be incredibly difficult to distinguish from reality.

Deep fakes provide sophisticated manipulations of audio, images, and videos that have the potential to deceive, manipulate, and spread misinformation at an unprecedented scale. From altering political speeches to fabricating celebrity scandals, the implications of deep fakes are vast and varied.

As the threat they pose to our society, it is imperative to understand the impact that deep fakes can have in our daily lives, as they blur the lines between truth and fiction, erode trust in media, and potentially disrupt democratic processes. By shedding light on this phenomenon, we can better equip ourselves to navigate the complex landscape of digital deception and safeguard against its harmful effects.

In this article, we delve into the intricate world of deep fakes, exploring their origins, implications, and the challenges they impose on us as members of our society.

What are deep fakes?

The fake Midjourney-created image of Pope Francis wearing a puffer jacket

Deep fakes are highly realistic digital manipulations that use advanced artificial intelligence (AI) techniques to alter or create audio, images, or videos. The term "deep fake" is a combination of "deep learning" and "fake", highlighting its reliance on a particular type of deep neural network for its creation.

These neural networks are trained on large datasets to recognize patterns and use that same information to generate new content similar to the dataset they have been trained on. They are what is called generative models [1].

The network learns to mimic the style, sound, mannerisms, or appearance of real individuals in order to generate realistic-looking content about them. This learning process allows for the creation of media that is incredibly difficult to distinguish from authentic media. The content often shows the individuals in question doing something controversial, immoral, or plainly illegal [2].

Deep fakes can be applied in different forms of media: audio, images, and video.

In audio deep fakes, the generative model synthesizes speech that sounds convincingly human, mimicking the tone, cadence, and inflections of a particular individual. This technique can be used to manipulate existing audio tracks or create new ones [3,4].

Deep fakes in images involve the alteration of photographs, the face swapping between individuals, or the change of facial expressions and features, among other things. This technique has had a significant resurgence over the last year with the recent developments in diffusion models [5].

Finally, video deep fakes are among the most pernicious usages of this technology. These deep fakes take image manipulation to the next level by animating the altered images to create life-like videos. There are two main ways to go about it. The first is by mapping facial movements and expressions from one video onto another; the result makes it look like individuals are saying or doing things they never did [6]. With the new generative AI technology, there are new possibilities for generating videos without the requirement of a drop-in replacement [7].

A Brief History of Deep Fakes

The term "deep fake" entered the public lexicon near the end of 2017 after a Reddit user named "deepfakes" posted a deep fake that face-swapped Gal Gadot into a pornographic video [8] in a newly created subreddit at the time, aptly named "r/deepfakes".

However, this was not the first time that the concept of the deep fake was applied, as the first iterations of the technology were in development back in 1997 with the project "Video Rewrite", which modified existing video footage of a person speaking to depict that person mouthing the words contained in a different audio track [9]. It was the first system to automate this kind of facial reanimation fully, and it did so using machine learning techniques to make connections between the sounds produced by a video's subject and the shape of the subject's face.

In 2016, Thies et al. [10] released Face2Face, which modifies video footage of a person's face to depict them mimicking the facial expressions of another person in real-time. An important contribution of this method was the possibility of reenacting facial expressions in real-time using a camera that does not capture depth, which makes it possible for the technique to be performed in common consumer cameras.

Another landmark project on deep fake technology was "Synthesizing Obama" in 2017 [11], which modified video footage of former US president Barack Obama to depict him mouthing the words contained in a separate audio track.

The technology gained track after Reddit's popularity in using it with spurious motives with the r/deepfakes subreddit, which eventually was banned from the site [12], but kept going in less harmful communities like r/SFWdeepfakes, in which community members share deepfakes depicting celebrities, politicians, and others in non-pornographic scenarios.

The deep fakes technology has become more used in commercial settings, a lot of times with the objective of reviving or de-aging past movie stars. For example, Lucasfilm notably hired a YouTube that specializes in deep fake technology to help them in their quest to live in the past [13].

Technologies Behind Deep Fakes

The technology behind deep fakes is based on different architectures for neural networks. Depending on the target media that is being faked, different methods will work better or worse. We'll be doing a quick recap of some of the most common technologies and methods that are currently known.

Autoencoder-based methods are among the earliest techniques used for generating deep fakes. Autoencoders are neural networks trained to reconstruct input data, such as images or videos, with the goal of minimizing reconstruction error, which we talked about in an article a couple of weeks ago [14]. In deep fake generation, an autoencoder is trained on a dataset of real images and then used to encode and decode facial features onto target faces, creating a synthetic video or image [15]. Variational Autoencoders [14], a generative model based on autoencoders, are also target architectures for developing deep fake technology.

A popular upgrade to this architecture attaches a generative adversarial network (GAN) to the decoder [16]. A GAN trains a generator, in this case, the decoder, and a discriminator in an adversarial relationship. The generator creates new images from the latent representation of the source material while the discriminator attempts to determine whether or not the image is generated, which causes the generator to create images that mimic reality extremely well, as the discriminator would catch any defects [17].

Both autoencoders and generative adversarial networks, and their combination, are common methods to use for face-swapping techniques, which involve replacing the face of one individual in a video or image with the face of another person.

GANs have also been used to build lip-syncing models, which are designed to synchronize the movements of a synthetic face with audio input [18].

More recently, especially in image generation, diffusion models [19] have been applied for something very close to deep fakes, which is style transfer. The core idea of this technique involves transferring the style of one image or video onto another while preserving the content. These methods use deep neural networks to extract style and content features from input images and apply them to target images. Style transfer can be used to generate artistic deep fakes or mimic the visual characteristics of specific artists or styles.

Audio deep fakes have their own set of techniques based on replay, synthesis, or imitation. Replay-based deepfakes are malicious works that aim to reproduce a recording of the interlocutor's voice [20].

In speech synthesis, the objective is the artificial production of human speech using software or hardware system programs. Speech synthesis includes Text-To-Speech, which aims to transform the text into acceptable and natural speech in real-time [21].

Audio deepfake based on imitation is a way of transforming an original speech from one speaker - the original - so that it sounds spoken like another speaker - the target one. An imitation-based algorithm takes a spoken signal as input and alters it by changing its style, intonation, or prosody, trying to mimic the target voice without changing the linguistic information [22].

Challenges and Dangers Posed by Deep Fakes

Deep fake technology has raised significant concerns due to its potential to deceive, manipulate, and spread misinformation. They are also controversial from the point of view of ethics because of their potential use for "replacing" real people.

One of the most significant dangers of deep fakes is their potential to spread misinformation and fake news. Deep fake videos can be used to fabricate speeches, interviews, or events, leading to false perceptions and beliefs among the public. This misuse of the technology poses a serious threat to democratic processes, public trust, and societal stability.

A primary pitfall is that humanity could fall into an age in which it can no longer be determined whether a medium's content corresponds to the truth [23]. Even though deep fake technology isn't inherently harmful, ready access to deep fake technology can be weaponized and allow cybercriminals, political activists, and nation-states to create cheap, realistic forgeries quickly [24].

Face-swapping deep fakes have been used on multiple occasions with the aid of professional impersonators, which provide the mannerisms, similar voices, and expressions that are enhanced with the technology to create life-like replacements of real people.

In 2018, Jordan Peele raised awareness of the dangers of deep fakes when, in collaboration with Buzzfeed, imitated the likeness of the former president of the USA. Barack Obama [25]. Even though this was a PSA that tried to warn people of the dangers of the technology, in the recent 2023 election bid for the president of the USA, there were instances of using deep fakes during the campaign of Ron DeSantis trying to misrepresent Donald Trump [26]. The campaign featured an ad that used AI fakes of Trump hugging Fauci, a target of the far-right since the 2020 pandemic, in a deliberate attempt to blur reality. This was not the first time Trump's likeness was the victim of a deep fake generation, since in March of 2023, Midjourney was used to generate images of Trump being arrested [27].

A fake Midjourney-created image of Donald Trump being arrested

Other influential figures have had their fair share of attacks via deep fakes, such as was the case of Mark Zuckerberg, CEO of Meta, back in 2019, when a video created by artists Bill Posters and Daniel Howe in partnership with advertising company Canny, showed Zuckerberg sitting at a desk, seemingly giving a sinister speech about Facebook's power. The video was framed with broadcast chyrons that said, "We're increasing transparency on ads", to make it look like it's part of a news segment [28].

The use of deep fakes for generating misleading images or videos of influential people isn't the only problem. Celebrities have multiple battle fronts to deal with when it comes to deep fakes generation.

Women celebrities, especially, have to deal with the use of their likeness in the generation of pornographic content. As we know, the term deep fake appeared in the subreddit dedicated to creating videos that face-swapped celebrities' faces with pornstars. However, now the problem has evolved, with some celebrities, like Taylor Swift, having to face the flooding of AI-generated pornographic images of them from scratch [29]. Other celebrities have to deal with companies using AI-generated ads of them endorsing products [30], which, more often than not, are plain scams [31].

Audio deepfakes have been used as part of social engineering scams, fooling people into thinking they are receiving instructions from a trusted individual [32].

Finally, even the "professional" use of this technology poses a threat to many jobs, especially among acting professionals, where digital clones aided by deep fakes have been used more and more in the industry, raising awareness among the professionals of the entertainment business because of the possibility to be replaced. This was a major issue during the 2023 SAG-AFTRA strike, when actors and actresses were adamant about negating the Studios wish to use the technology to reduce working hours, especially among extras, by only paying a minimum for digital scans and the reutilization of the likeness at perpetuity [33]. A couple of years ago, the likeness of Elvis Presley was brought to the small screen during season 17 of America's Got Talent, where the old singer was put in a duet with Simon Cowell [34].

There's also the ethical issue of using this technology even with "good intentions", such as the imitation of a real human so it can "live for eternity", as some AI startups are doing in the present time [35]. The emergence of deep fake technology raises complex legal and ethical questions regarding issues such as consent, defamation, intellectual property rights, and freedom of expression. Existing laws may be inadequate to address the challenges posed by deep fakes, requiring policymakers to develop new regulations and frameworks.

Thoughts on Deep Fake Technology and Their Issues

As deep fake technology continues to evolve, it is crucial to implement effective solutions to mitigate the risks and challenges it poses. As with everything when it comes to technology, once it's out there, it's really difficult to limit it or "get it back in". That isn't a reason to discard any possible solution as not valid just because it won't be applied by those who have the intent to misuse such technology.

My view is rather open in the development of new technologies, and I prefer something out in the open that is accessible to everyone to play and use rather than making it private and charging a fee or limiting the use of the technology just to a couple of powerful players. I strongly believe that education on how these technologies work and making people understand what are the pros and cons of their utilization play a significant role in reducing their misusage of, or at least raise awareness and make it harder for scammers to use the technology with ill purposes.

There are major works in the automatic detection of deep fakes, both in video and audio, using the same technology, i.e., neural networks, but for the purpose of detecting instead of generating this content.

There are also huge advancements in techniques to watermark and authenticate what is generated by these tools, such as embedding invisible or cryptographic watermarks into images and videos to track their origins and detect alterations.

However, all the technology that we might be able to apply to the detection and marking of AI-generated content will be rendered useless in the long run, especially as the technology for generations progresses more and more. The end solution is the correct development of legal and regulatory frameworks for this technology. The technology of AI generation, and not only the deep fake technology, needs to be put up by governments since we definitively cannot leave it to multinational corporations, which only seek to improve their profits, to be their own regulators.

Even if corporations are not actively using the technology with ill faith, the truth of the matter is that generative AI technology, especially with large and complex models such as Large Language Models or Diffusion Models, is currently being trained on hundreds of millions of dollars, which is something only large corporations can access. Social media platforms also have their fair share of responsibility, as the dissemination of viral deep fakes is enhanced by their algorithms, which seek to engage the population. The harsh truth is the misuse of this technology is engaging for many internet users either because of selection bias, for example, because they are looking at things that confirm their skewed views, like is the case for political manipulation, or simply because they find it entertaining, for example, looking at pornographic content of their favorite celebrities. The implementation of regulations and punishments for the spreading of misinformation or deep fakes should also be correctly addressed by different legislations.

Also, many of these models are trained on large amounts of data that are never disclosed because those who train the models need to avoid problems regarding copyright. This scenario also brings the opportunity to review copyright laws as well, not only by enforcing them but also by considering the fair usage of content that is massively hoarded by a limited number of media corporations.

Anyhow, deep fakes are just the tip of the iceberg when it comes to the dangers and misuses of AI, we need to learn from them, evolve around the challenges they pose, and build better societies on the idea of the correct regulation of their use.

References

[1] Cardellino, C. 2024. Generative vs. Discriminative Models in Machine Learning. Transcendent AI. https://www.transcendent-ai.com/post/generative-vs-discriminative-models-in-machine-learning

[2] Cole, S. 2018. We Are Truly Fucked: Everyone Is Making AI-Generated Fake Porn Now. Vice. https://www.vice.com/en/article/bjye8a/reddit-fake-porn-app-daisy-ridley

[3] Lyons, K. 2020. FTC says the tech behind audio deepfakes is getting better. The Verge. https://www.theverge.com/2020/1/29/21080553/ftc-deepfakes-audio-cloning-joe-rogan-phone-scams

[4] Jia, Y., Zhang, Y., Weiss, R., Wang, Q., Shen, J., Ren, F., ... & Wu, Y. 2018. Transfer learning from speaker verification to multispeaker text-to-speech synthesis. Advances in neural information processing systems, 31.

[5] Cardellino, C. 2024.A Brief Overview of Diffusion Models and Their Applications. Transcendent AI. https://www.transcendent-ai.com/post/a-brief-overview-of-diffusion-models-and-their-applications

[6] Haysom, S. 2018. People are using face-swapping tech to add Nicolas Cage to random movies and what is 2018. Mashable. https://web.archive.org/web/20190724221500/https://mashable.com/2018/01/31/nicolas-cage-face-swapping-deepfakes/

[7] Jackson, B. 2024. Deepfakes Are About to Become a Lot Worse, OpenAI's Sora Demonstrates. Spiceworks. https://www.spiceworks.com/tech/artificial-intelligence/guest-article/deepfakes-are-about-to-become-a-lot-worse-openais-sora-demonstrates/

[8] Cole, S. 2017. AI-Assisted Fake Porn Is Here and We're All Fucked. Vice. https://www.vice.com/en/article/gydydm/gal-gadot-fake-ai-porn

[9] Bregler, C., Covell, M. and Slaney, M., 2023. Video rewrite: Driving visual speech with audio. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2 (pp. 715-722).

[10] Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C. and Nießner, M., 2016. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2387-2395).

[11] Suwajanakorn, S., Seitz, S.M. and Kemelmacher-Shlizerman, I., 2017. Synthesizing Obama: learning lip sync from audio. ACM Transactions on Graphics (ToG), 36(4), pp.1-13.

[12] Hathaway, J. 2018. Reddit bans' deepfakes' and other fake celebrity porn. Daily Dot. https://www.dailydot.com/debug/reddit-bans-deepfakes/

[13] Hollister, S. 2021. Lucasfilm hires the YouTube deepfaker who put its Luke, Leia and Tarkin cameos to shame. The Verge. https://www.theverge.com/2021/7/26/22595227/star-wars-lucasfilm-mandalorian-rogue-one-hire-deepfake-shamook

[14] Cardellino, C. 2021. Variational Autoencoders: Intuition and Math. Transcendent AI. https://www.transcendent-ai.com/post/variational-autoencoders-intuitions-and-math

[15] Zucconi, A. 2018. Understanding the Technology Behind DeepFakes. https://www.alanzucconi.com/2018/03/14/understanding-the-technology-behind-deepfakes/

[16] Cardellino, C. 2024. Generative Adversarial Networks (GANs): A Comprehensive Exploration. Transcendent AI. https://www.transcendent-ai.com/post/generative-adversarial-networks-gans-a-comprehensive-exploration

[17] Kan, C. E. 2018. What The Heck Are VAE-GANs? Medium. https://web.archive.org/web/20210714032915/https://towardsdatascience.com/what-the-heck-are-vae-gans-17b86023588a?gi=aa6ed506d7be

[18] Vincent, J. 2020. I learned to make a lip-syncing deepfake in just a few hours (and you can, too). The Verge. https://www.theverge.com/21428653/lip-sync-ai-deepfake-wav2lip-code-how-to

[19] Cardellino, C. 2024. A Brief Overview of Diffusion Models and Their Applications. Transcendent AI. https://www.transcendent-ai.com/post/a-brief-overview-of-diffusion-models-and-their-applications

[20] Khanjani, Z., Watson, G. and Janeja, V.P., 2021. How deep are the fakes? focusing on audio deepfake: A survey. arXiv preprint arXiv:2111.14203.

[21] Tan, X., Qin, T., Soong, F. and Liu, T.Y., 2021. A survey on neural speech synthesis. arXiv preprint arXiv:2106.15561.

[22] Zhang, M., Wang, X., Fang, F., Li, H. and Yamagishi, J., 2019. Joint training framework for text-to-speech and voice conversion using multi-source tacotron and wavenet. arXiv preprint arXiv:1903.12389.

[23] Vaccari, C. and Chadwick, A., 2020. Deepfakes and disinformation: Exploring the impact of synthetic political video on deception, uncertainty, and trust in news. Social media+ society, 6(1), p.2056305120903408.

[24] Smith, H. and Mansted, K., 2020. Weaponised deep fakes: national security and democracy.

[25] Romano, A. 2018. Jordan Peele's simulated Obama PSA is a double-edged warning against fake news. Vox. https://www.vox.com/2018/4/18/17252410/jordan-peele-obama-deepfake-buzzfeed

[26] Shuham, M. 2023. DeSantis Campaign Ad Features AI Fakes Of Trump Hugging Fauci. HuffPost. https://www.huffpost.com/entry/desantis-trump-fauci-fake-ai-ad_n_64822436e4b025003edc3c8b

[27] AP News. 2023. AI-generated images of Trump being arrested circulate on social media. Associated Press. https://apnews.com/article/fact-check-trump-nypd-stormy-daniels-539393517762

[28] Cole, S. 2019. This Deepfake of Mark Zuckerberg Tests Facebook's Fake Video Policies. Vice. https://www.vice.com/en/article/ywyxex/deepfake-of-mark-zuckerberg-facebook-fake-video-policy

[29] Weatherbed, J. 2024. Trolls have flooded X with graphic Taylor Swift AI fakes. The Verge. https://www.theverge.com/2024/1/25/24050334/x-twitter-taylor-swift-ai-fake-images-trending

[30] Taylor, D. 2023. Tom Hanks Warns of Dental Ad Using AI Version of Him. The New York Times. https://www.nytimes.com/2023/10/02/technology/tom-hanks-ai-dental-video.html

[31] Johnson, K. 2023. Arizona woman falls victim to deepfake scam using celebrities on social media. ABC15 Arizona. https://www.abc15.com/news/let-joe-know/arizona-woman-falls-victim-to-deep-fake-scam-using-celebrities-on-social-media

[32] Statt, N. 2019. Thieves are now using AI deepfakes to trick companies into sending them money. The Verge. https://www.theverge.com/2019/9/5/20851248/deepfakes-ai-fake-audio-phone-calls-thieves-trick-companies-stealing-money

[33] Verma, P. 2023. Digital clones made by AI tech could make Hollywood extras obsolete. Washington Post. https://www.washingtonpost.com/technology/2023/07/19/ai-actors-fear-sag-strike-hollywood/

[34] Bowenbank, S. 2022. Simon Cowell Duets With Elvis in Metaphysic's Latest Deepfake 'AGT' Performance: Watch. Billboard. https://www.billboard.com/culture/tv-film/simon-cowell-duet-elvis-deepfake-agt-performance-1235138799/

[35] Heikkilä, M. 2024. An AI startup made a hyperrealistic deepfake of me that's so good it's scary. Technology Review. https://www.technologyreview.com/2024/04/25/1091772/new-generative-ai-avatar-deepfake-synthesia/