As technology advances, the refinement of Artificial Intelligence (AI-)generated content also progresses. From text to visual art and audio, AI has become very proficient at creating content. The current state of this technology, known as Generative AI (GenAI), already makes it difficult to distinguish between content created by humans and content generated by computers. Do you, for example, know whether the title of this blog is crafted by us or generated by ChatGPT? Although using AI for content generation can be harmless, there is already evidence of criminals exploiting AI-generated content. Recently, a Hong Kong finance worker was convinced to transfer $25M to criminals during a video call with deepfake representations of his colleagues, including the CFO (CNN 04 Feb 2024).
Before delving further into GenAI, it is important to understand how generative models differ from other types of models: discriminative models. A discriminative model makes predictions on unseen data based on probabilities. A classic example of a discriminative model involves predicting whether a movie review is positive or negative, based on the words and phrases contained within it. This is done without generating new content, in contrast to generative models like ChatGPT and DALL·E, which are capable of autonomously generating content, or data, by learning patterns and structures from training data.
AI-generated content offers several promising opportunities, such as increased efficiency in content production, tailored content aligned with individual preferences, and the development of personal (digital) assistants that are indistinguishable from humans or even exceed human capabilities. However, every rose has its thorn: while human creators often have a deeper understanding of the context in which content is created because of their personal experiences and perspectives, AI models (currently) do not have the ability to fully comprehend the context in which they create. The AI-generated content may therefore not be entirely suitable for the intended context, which can result in inaccuracies and reduced reliability.
Additionally, (Gen)AI models are trained on large datasets, which may inadvertently contain biases. This can result in the reflection of societal biases in AI-generated content, which may lead to issues of fairness and impartiality. Moreover, there is a grave development in the exploitation of GenAI by malicious actors for purposes such as fraud, phishing attacks or the spread of misinformation. This can be done, for instance, by using deepfake (AI) technology, which poses risks by allowing the creation of realistic but fabricated audio and video content, raising concerns about identity theft. As an example, 404 Media Investigation uncovered an underground website, known as OnlyFake, which was capable of producing convincing images of counterfeit IDs for a mere $15 using GenAI (404 Media Investigations, 5 Feb 2024).
The remainder of this blog introduces various techniques aimed at assisting you in identifying AI-generated content. The first method focuses on detecting AI-generated content as a human. However, while it is generally still possible for observant individuals to distinguish AI-generated content, it is important to note that the described methods are not guaranteed to be foolproof. The last section is centered around identifying AI-generated content using AI technology.