Exp-Vqa: Fine-Grained Facial Expression Analysis Via Visual Question Answering
33 Pages Posted: 18 Jan 2025
Abstract
This paper presents a novel task, facial expression visual question answering (FEVQA), for fine-grained facial expression analysis across multiple scales. FEVQA interprets facial expressions in a more detailed and comprehensive way than traditional emotion categories or facial action units (AUs). To develop the FEVQA model, we fine-tuned an InstructBLIP using synthesized VQA pairs, forming the Exp-VQA model. These VQA pairs are synthesized leveraging the powerful descriptive ability of GPT3.5 and a rule-based generator, based on existing annotations on both emotion classification and AU detection datasets. Exp-VQA can describe the facial status of the whole face as well as infer the indicated emotion, detail facial actions in specific regions, and detect individual AU occurrences. Experiments demonstrate the effectiveness of Exp-VQA in describing multi-scale facial expressions, as well as state-of-the-art zero-shot ability in detecting unseen AUs. Furthermore, the training of Exp-VQA enhances its intermediate visual features’ performance on both AU detection and emotion classification tasks. The code and trained models are available at: https://github.com/Yujianyuan/Exp-VQA.
Keywords: Fine-grained facial expression analysis, emotion classification, facial action unit detection, visual question answering
Suggested Citation: Suggested Citation