Fidelity-Preserving Concept Stylization with Layer-Wise Lora and Multimodal Conditions

42 Pages Posted: 28 Apr 2025

See all articles by Suoyu Zhang

Suoyu Zhang

Macau University of Science and Technology

Eric C.C. Tsang

Macau University of Science and Technology

Yong Wang

Chongqing University of Technology (CQUT)

Jiaming Wu

Macau University of Science and Technology

Abstract

Concept personalization and image stylization have achieved prominent advancements owing to powerful text-to-image models. Nevertheless, several issues remain in concept stylization. First, overfitting to a concept during fine-tuning will degrade its stylization effect. Second, extracting style features with training-based methods may induce content leakage. Third, merging the learned weights of a concept-style pair makes it difficult to fuse these two elements. To overcome these challenges,we introduce Fidelity-Preserving Concept Stylization, a novel task that stylizes a concept while preserving its fidelity, and propose a method to achieve this effectively. To mitigate overfitting, we propose a two-stage optimization strategy with customized timestep distributions, where a layer-wise LoRA adapter and a special token are jointly optimized to progressively capture coarse-grained and semantic concept features. For effective feature fusion, we propose a training-free architecture to stylize the learned concept. The style and concept features are first extracted from reference images by a pretrained image adapter. These image features together with text embedding are subsequently processed and projected into a shared space as multimodal conditions, then dynamically fused in the cross-attention layers. Moreover, our method further enhances stylization effect through element-wise subtraction, thresholding, and iterative stylization. Comprehensive experiments demonstrate the effectiveness of our method, which stylizes the concept while maintaining the balance between stylization effect and concept fidelity.

Keywords: Concept stylization, Two-stage optimization, Layer-wise LoRA, feature fusion, Multimodal conditions

Suggested Citation

Zhang, Suoyu and Tsang, Eric C.C. and Wang, Yong and Wu, Jiaming, Fidelity-Preserving Concept Stylization with Layer-Wise Lora and Multimodal Conditions. Available at SSRN: https://ssrn.com/abstract=5233395 or http://dx.doi.org/10.2139/ssrn.5233395

Suoyu Zhang

Macau University of Science and Technology ( email )

China

Eric C.C. Tsang (Contact Author)

Macau University of Science and Technology ( email )

China

Yong Wang

Chongqing University of Technology (CQUT) ( email )

Jiaming Wu

Macau University of Science and Technology ( email )

China

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
5
Abstract Views
121
PlumX Metrics