Faceexpr: Personalized Facial Expression Generation Via Attention-Focused U-Net Feature Fusion in Diffusion Models

63 Pages Posted: 21 Feb 2025

See all articles by MUHAMMAD SHER AFGAN

MUHAMMAD SHER AFGAN

University of Science and Technology of China (USTC)

Bin Liu

affiliation not provided to SSRN

Mamoona Naveed Asghar

University of Galway

Wajahat Khalid

affiliation not provided to SSRN

Kai Zou

affiliation not provided to SSRN

Dianmo Sheng

affiliation not provided to SSRN

Abstract

Text-to-image diffusion models have revolutionized image generation by creating high-quality visuals from text descriptions. Despite their potential for personalized text-to-image applications, existing standalone methods have struggled to provide effective semantic modifications, while hybrid approaches relying on external embeddings are computationally complex and often compromise identity and fidelity. To overcome these challenges, we propose FaceExpr, a framework using standalone text-to-image models that provide accurate facial semantic modifications and synthesize facial images with diverse expressions, all while preserving the subject’s identity. Specifically, we introduce a person-specific fine-tuning approach with two key components: (1) Attention-Focused Fusion, which uses an attention mechanism to align identity and expression features by focusing on critical facial landmarks, preserving the subject’s identity, and (2) Expression Text Embeddings, integrated into the U-Net denoising module to resolve language ambiguities and enhance expression accuracy. Additionally, an expression crafting loss is employed to strengthen the alignment between identity and expression. Furthermore, by leveraging the prior preservation loss, we enable the synthesis of expressive faces in diverse scenes, views, and conditions. Extensive experiments demonstrate that FaceExpr outperforms both standalone and hybrid methods, highlighting its potential for personalized content generation in digital storytelling, immersive virtual environments, and advanced research applications. For code visit: https://github.com/MSAfganUSTC/FaceExpr.git.

Keywords: Attention, Diffusion, Expressions Synthesis, Fusion, Text-to-Image

Suggested Citation

AFGAN, MUHAMMAD SHER and Liu, Bin and Asghar, Mamoona Naveed and Khalid, Wajahat and Zou, Kai and Sheng, Dianmo, Faceexpr: Personalized Facial Expression Generation Via Attention-Focused U-Net Feature Fusion in Diffusion Models. Available at SSRN: https://ssrn.com/abstract=5148247 or http://dx.doi.org/10.2139/ssrn.5148247

MUHAMMAD SHER AFGAN

University of Science and Technology of China (USTC) ( email )

University of Science and Technology of China No.
Hefei, Anhui 230026
China
15656973865 (Phone)

Bin Liu (Contact Author)

affiliation not provided to SSRN ( email )

No Address Available

Mamoona Naveed Asghar

University of Galway ( email )

Wajahat Khalid

affiliation not provided to SSRN ( email )

No Address Available

Kai Zou

affiliation not provided to SSRN ( email )

No Address Available

Dianmo Sheng

affiliation not provided to SSRN ( email )

No Address Available

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
84
Abstract Views
333
Rank
650,103
PlumX Metrics