AI Model Achieves "Human-Like" Accuracy in Detecting AI-Generated Images, Explains Its Reasoning
Researchers at Shanghai Jiao Tong University and Ant Group have developed a new AI model that not only detects AI-generated images with unprecedented accuracy but also provides human-understandable explanations for its decisions. This marks a significant step towards transparent and trustworthy AI in an era where distinguishing between real and synthetic content is becoming increasingly challenging.
The team’s approach, outlined in a recently released paper, leverages Multi-modal Large Language Models (MLLMs) and introduces a novel “FakeXplained” dataset of AI-generated images paired with detailed annotations. These annotations pinpoint visual anomalies and inconsistencies, like a flamingo with three legs or repetitive patterns on a blanket. These are then used to train the MLLM.
“Existing methods often act as black boxes,” explains Dr. Jianfu Zhang, a lead author on the paper. “They might achieve high accuracy, but they don’t tell you why an image is classified as fake. Our model can identify specific visual cues and logical inconsistencies that betray an image’s synthetic origin.”
The model’s ability to not only detect but also explain its reasoning sets it apart. For instance, if presented with an AI-generated image of a golf course, the model might highlight a rearview mirror disconnected from the vehicle or tires that are too thin, explaining these anomalies in natural language. It might also detect distorted text on signs in the image.
The researchers fine-tuned a large language model using a two-stage optimization strategy. First, Supervised Fine-Tuning (SFT) establishes a foundation for reasoning and consistent formatting. Second, Reinforcement Learning from Human Feedback (RLHF) further refines the model through the use of a special technique called progressive Group Relative Policy Optimization (GRPO), which balances the objectives of accurate detection, visual localization, and clear explanations. This approach helped the model achieve an overall accuracy of 98.1% which indicates robust performance across various AI image generators.
To rigorously test their model, the researchers compared its performance against existing image detection methods across a diverse range of AI-generated images, including those generated using models such as DALL-E 3 and Midjourney v5. The results demonstrated that their model significantly outperformed baseline methods in both detection accuracy and visual grounding. Notably, the model achieved an Intersection over Union (IoU) score of 37.8% in localization, outperforming segmentation-based baselines. IoU measures how well the model’s identified regions of fakery align with human annotations. A higher IoU indicates better alignment.
The team also conducted a “human preference evaluation.” They showed pairs of outputs – one from the model and one from a human annotator – to evaluators, asking them to choose the annotation that better aligned the region with the caption and demonstrated higher quality overall. The model was preferred in nearly half of the cases, nearing human-level performance in generating region-grounded explanations.
This research has potentially far-reaching implications. Explainable AI-generated image detection can bolster trust in digital media, support verification workflows for journalists, and facilitate more informed decision-making in legal and ethical contexts.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.