Multi Model Attention Network for Video Source Camera Identification
10 Pages Posted: 5 Nov 2024
Abstract
With the development of smartphones and short video platform, digital video has become an important medium for information dissemination. However, the widespread distribution of videos has also brought many social issues. Video Source Camera Identification (VSCI) has emerged as a crucial component in the field of video forensics, playing an important role in combating false information and improving media credibility. Existing methods such as those based on Photo Response Non-Uniformity (PRNU) or machine learning are common solutions. However, most existing research has largely ignored an important piece of information present in videos: acoustic features. The contributions of audio and visuals to scene understanding evolve over time, and an efficient solution should be adaptive. To address this challenge, we proposed the Multi Modal Attention Network (MMAnet) to dynamically perform visual and audio fusion for VSCI. Meanwhile, we use Gated Recurrent Units (GRU) to fully utilize temporal information. We designed experiments, and our model achieved satisfactory performance on benchmark public databases (such as VISION, Daxing and QUFVD).
Keywords: Video Source Camera Identification, Visual and Audio Fusion, Multi Modal Attention Network, Gated Recurrent Units
Suggested Citation: Suggested Citation