Learning Pairwise Interaction for Generalizable DeepFake Detection

Ying Xu ¹,

Kiran Raja ¹,

Luisa Verdoliva ²,

Marius Pedersen ¹

Norwegian University of Science and Technology¹

University Federico II of Naples²

WACVW 2023 [paper] [BibTeX] [code] [video]

Abstract

We propose a new framework Multi-Channel Xception Attentive Pairwise Interaction (MCX-API) for Deepfakes detection by exploiting color space and pairwise interaction simultaneously, bringing a novel fine-grained idea for the Deepfakes detection field. In addition, we report all results by balanced-open-set-classification (BOSC) accuracy to exemplify the generalizability of our proposed approach. Furthermore, we conduct cross-datasets validations with three SOTA Deepfake datasets, Celeb-DF, KoDF and FakeAVCelebDF. Furthermore, we compared the results with SOTA Deepfake detection methods. Our MCX-API obtains 98.48% BOSC accuracy on the FF++ dataset and 90.87% BOSC accuracy on the Celeb-DF dataset, indicating an optimistic direction for the generalization of DeepFake detection.

Intra-dataset Evaluation

Our proposed method MCX-API with RGB color space obtains the best performance compared to SOTA methods. The best accuracy of the BOSC is 98.48%, and the highest AUC score is 0.9968. The result shows that our idea of pairwise learning in a finegrained manner could work well in inter-class (closed-set) setting of Deepfake detection problem.

Comparison of the test results on the FF++ dataset with c23 (high-quality compression) settings. Training for all networks is carried out on FF++ c23. The accuracy and AUC score are at frame-level.

Cross-dataset Evaluation

We employ FakeAV, KoDF, and Celeb-DF to test the generalizability of our MCX-API network. In general, our proposed network gets a relatively better performance than the SOTA methods which indicates the better generalizability of the proposed MCX-API network.

Training for all networks are carried out on the FF++ c23 dataset and tested on FakeAV, KoDF, and Celeb-DF.

Blow up in Activation Maps

We blow up the activation maps from LayerCAM for DF and F2F images in Fig. 4. From visual analysis, it is evident that the MCX-API focuses more on the facial region, such as the eyes and the mouth. For instance, double eyebrows are found in the DF image (blue circle). MCX-API pays more attention than API around this region.

Blow up in activation maps from LayerCAM analysis of MCX-API(RGB) and base architecture API-Net on DF and F2F faces.