Faceexpr: Personalized Facial Expression Generation Via Attention-Focused U-Net Feature Fusion in Diffusion Models
63 Pages Posted: 21 Feb 2025
Abstract
Text-to-image diffusion models have revolutionized image generation by creating high-quality visuals from text descriptions. Despite their potential for personalized text-to-image applications, existing standalone methods have struggled to provide effective semantic modifications, while hybrid approaches relying on external embeddings are computationally complex and often compromise identity and fidelity. To overcome these challenges, we propose FaceExpr, a framework using standalone text-to-image models that provide accurate facial semantic modifications and synthesize facial images with diverse expressions, all while preserving the subject’s identity. Specifically, we introduce a person-specific fine-tuning approach with two key components: (1) Attention-Focused Fusion, which uses an attention mechanism to align identity and expression features by focusing on critical facial landmarks, preserving the subject’s identity, and (2) Expression Text Embeddings, integrated into the U-Net denoising module to resolve language ambiguities and enhance expression accuracy. Additionally, an expression crafting loss is employed to strengthen the alignment between identity and expression. Furthermore, by leveraging the prior preservation loss, we enable the synthesis of expressive faces in diverse scenes, views, and conditions. Extensive experiments demonstrate that FaceExpr outperforms both standalone and hybrid methods, highlighting its potential for personalized content generation in digital storytelling, immersive virtual environments, and advanced research applications. For code visit: https://github.com/MSAfganUSTC/FaceExpr.git.
Keywords: Attention, Diffusion, Expressions Synthesis, Fusion, Text-to-Image
Suggested Citation: Suggested Citation