Exp-Vqa: Fine-Grained Facial Expression Analysis Via Visual Question Answering

Yuan, Yujian; Zeng, Jiabei; Shan, Shiguang

doi:10.2139/ssrn.5102484

Download This Paper

Open PDF in Browser

Add Paper to My Library

Exp-Vqa: Fine-Grained Facial Expression Analysis Via Visual Question Answering

33 Pages Posted: 18 Jan 2025

See all articles by Yujian Yuan

This paper presents a novel task, facial expression visual question answering (FEVQA), for fine-grained facial expression analysis across multiple scales. FEVQA interprets facial expressions in a more detailed and comprehensive way than traditional emotion categories or facial action units (AUs). To develop the FEVQA model, we fine-tuned an InstructBLIP using synthesized VQA pairs, forming the Exp-VQA model. These VQA pairs are synthesized leveraging the powerful descriptive ability of GPT3.5 and a rule-based generator, based on existing annotations on both emotion classification and AU detection datasets. Exp-VQA can describe the facial status of the whole face as well as infer the indicated emotion, detail facial actions in specific regions, and detect individual AU occurrences. Experiments demonstrate the effectiveness of Exp-VQA in describing multi-scale facial expressions, as well as state-of-the-art zero-shot ability in detecting unseen AUs. Furthermore, the training of Exp-VQA enhances its intermediate visual features’ performance on both AU detection and emotion classification tasks. The code and trained models are available at: https://github.com/Yujianyuan/Exp-VQA.

Keywords: Fine-grained facial expression analysis, emotion classification, facial action unit detection, visual question answering

Suggested Citation: Suggested Citation

Yuan, Yujian and Zeng, Jiabei and Shan, Shiguang, Exp-Vqa: Fine-Grained Facial Expression Analysis Via Visual Question Answering. Available at SSRN: https://ssrn.com/abstract=5102484 or http://dx.doi.org/10.2139/ssrn.5102484