Deepfake Detection that Generalizes Across Benchmarks

100%

Deepfake Detection that Generalizes Across Benchmarks

Abstract

The generalization of deepfake detectors to unseen manipulation techniques remains a challenge for practical deployment. Although many approaches adapt foundation models by introducing significant architectural complexity, this work demonstrates that robust generalization is achievable through a parameter-efficient adaptation of one of the foundational pre-trained vision encoders. The proposed method, GenD, fine-tunes only the Layer Normalization parameters (zero point zero three percent of the total) and enhances generalization by enforcing a hyperspherical feature manifold using L two normalization and metric learning on it.

We conducted an extensive evaluation on fourteen benchmark datasets spanning from twenty nineteen to twenty twenty-five. The proposed method achieves state-of-the-art performance, outperforming more complex, recent approaches in average cross-dataset AU-ROC. Our analysis yields two primary findings for the field: one) training on paired real-fake data from the same source video is essential for mitigating shortcut learning and improving generalization, and two) detection difficulty on academic datasets has not strictly increased over time, with models trained on older, diverse datasets showing strong generalization capabilities.

This work delivers a computationally efficient and reproducible method, proving that state-of-the-art generalization is attainable by making targeted, minimal changes to a pre-trained foundational image encoder model. The code is at:

One. Introduction

The proliferation of realistic facial deepfakes raises significant concerns regarding misinformation and malicious use, with AI-manipulated videos - those altered by techniques like face swapping or face reenactment - making detection challenging. Unlike fully synthetic content, such forgeries preserve the original context and leave subtle artifacts that are difficult for humans and machines to detect.

A primary issue affecting current detection methods is their limited ability to generalize. A model that has been trained to identify images altered by a particular deepfake generation algorithm often struggles when faced with examples produced by a new generation algorithm.

The generalization gap is the primary issue that we address in this work. Assuming the hypothesis that adapted large-scale, pre-trained foundational vision encoder can serve as a general foundation for deepfake detection, we build the proposed method in three variants, using Contrastive Language-Image Pre-training, Perception Encoder, and DINO models as feature extractors, which are known for their generalizable visual representations.

The proposed method consists of a vision encoder, whose outputs are L two-normalized. We then fine-tune only the parameters of the Layer Normalization blocks while keeping the rest frozen. Additionally, we propose using metric learning in this L two space to enhance generalization.

We benchmarked the generalization capabilities of the proposed model on fourteen deepfake video datasets released between twenty nineteen and twenty twenty-five, listed in Table one. To our knowledge, this represents the broadest evaluation in the deepfake literature. We show that the proposed model outperforms the most recent state-of-the-art methods on the majority of all available benchmarks.

In summary, our key contributions are as follows:

· A novel deepfake detection method called GenD. The method achieves the best average cross-dataset AUROC compared to recently released models.

· The most comprehensive evaluation in the deepfake literature covering datasets released throughout six years of research.

· A demonstration that to achieve the best generalization and prevent shortcut learning, it is essential to construct the training set consisting of real-fake pairs, where the fake video is generated from the real counterpart of the pair.

Two. Related Work

Three. Method

Three point one. Model

Three point two. Data

Four. Experiments

Four point one. Test benchmarks

Four point two. Cross-dataset evaluation

Four point three. Ablation studies

Four point four. Importance of training on paired dataset

Four point five. Evolution of detection difficulty over the years

Four point six. Robustness to image degradations

Five. Limitations and Future Work

Six. Conclusions

Overview

The paper introduces GenD, an efficient deepfake detection technique that outperforms complex models by fine-tuning a small percentage of parameters in foundational vision encoders. It highlights the importance of training with paired real-fake data for improved generalization and presents comprehensive evaluations across 14 datasets.

Key Points

1GenD fine-tunes only Layer Normalization parameters, making it computationally efficient
2The method achieves state-of-the-art performance across various deepfake benchmarks
3Training on paired real-fake data is crucial for effective generalization
4The study provides insights into the historical progression of detection difficulty in academic datasets
5Comprehensive evaluation covering six years of deepfake research demonstrates the model's capabilities.

Details

Authors: Andrii Yermakov, Jan Cech, Jiri Matas, Mario Fritz
Category: Technology and Engineering

PDF
KarGO: A Smarter Mobile Platform for Tricycle Transportation
KarGO is a mobile platform designed to optimize tricycle transportation in the Philippines, making it easier for users to book rides and helping registered drivers find more passengers, while ensuring safety and convenience through technology.
PDF
KarGO: A Smarter Transportation Solution for Tricycles
This document introduces KarGO, a mobile platform designed to improve the tricycle transportation experience for passengers and drivers in the Philippines. It outlines how users can book rides or deliveries and emphasizes the convenience and safety features of the app.
PDF
KarGO: A Smarter Way to Move Your Community
KarGO is a mobile platform designed to improve transportation for passengers and tricycle drivers in the Philippines, allowing users to book rides, track trips in real-time, and utilize cashless payments.
PDF
Introducing KarGO: A Smarter Transportation Solution for Tricyle Services
KarGO is a mobile platform designed to streamline tricycle transportation in the Philippines, allowing passengers to easily book rides and drivers to find more opportunities. The platform enhances safety for school transportation with real-time GPS tracking and facilitates cashless transactions.
PDF
Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment
This comprehensive survey explores Cognitive Edge Computing as a methodology for deploying advanced AI models and agents on resource-constrained edge devices. It examines model optimization, system architecture, and adaptive intelligence necessary for effective cognitive processing in such environments.