Faceexpr: Personalized Facial Expression Generation Via Attention-Focused U-Net Feature Fusion in Diffusion Models

AFGAN, MUHAMMAD  SHER; Liu, Bin; Asghar, Mamoona Naveed; Khalid, Wajahat; Zou, Kai; Sheng, Dianmo

doi:10.2139/ssrn.5148247

Download This Paper

Open PDF in Browser

Add Paper to My Library

Faceexpr: Personalized Facial Expression Generation Via Attention-Focused U-Net Feature Fusion in Diffusion Models

63 Pages Posted: 21 Feb 2025

See all articles by MUHAMMAD SHER AFGAN

MUHAMMAD SHER AFGAN

University of Science and Technology of China (USTC)

Text-to-image diffusion models have revolutionized image generation by creating high-quality visuals from text descriptions. Despite their potential for personalized text-to-image applications, existing standalone methods have struggled to provide effective semantic modifications, while hybrid approaches relying on external embeddings are computationally complex and often compromise identity and fidelity. To overcome these challenges, we propose FaceExpr, a framework using standalone text-to-image models that provide accurate facial semantic modifications and synthesize facial images with diverse expressions, all while preserving the subject’s identity. Specifically, we introduce a person-specific fine-tuning approach with two key components: (1) Attention-Focused Fusion, which uses an attention mechanism to align identity and expression features by focusing on critical facial landmarks, preserving the subject’s identity, and (2) Expression Text Embeddings, integrated into the U-Net denoising module to resolve language ambiguities and enhance expression accuracy. Additionally, an expression crafting loss is employed to strengthen the alignment between identity and expression. Furthermore, by leveraging the prior preservation loss, we enable the synthesis of expressive faces in diverse scenes, views, and conditions. Extensive experiments demonstrate that FaceExpr outperforms both standalone and hybrid methods, highlighting its potential for personalized content generation in digital storytelling, immersive virtual environments, and advanced research applications. For code visit: https://github.com/MSAfganUSTC/FaceExpr.git.

Keywords: Attention, Diffusion, Expressions Synthesis, Fusion, Text-to-Image

Suggested Citation: Suggested Citation

AFGAN, MUHAMMAD SHER and Liu, Bin and Asghar, Mamoona Naveed and Khalid, Wajahat and Zou, Kai and Sheng, Dianmo, Faceexpr: Personalized Facial Expression Generation Via Attention-Focused U-Net Feature Fusion in Diffusion Models. Available at SSRN: https://ssrn.com/abstract=5148247 or http://dx.doi.org/10.2139/ssrn.5148247