Fidelity-Preserving Concept Stylization with Layer-Wise Lora and Multimodal Conditions
42 Pages Posted: 28 Apr 2025
Abstract
Concept personalization and image stylization have achieved prominent advancements owing to powerful text-to-image models. Nevertheless, several issues remain in concept stylization. First, overfitting to a concept during fine-tuning will degrade its stylization effect. Second, extracting style features with training-based methods may induce content leakage. Third, merging the learned weights of a concept-style pair makes it difficult to fuse these two elements. To overcome these challenges,we introduce Fidelity-Preserving Concept Stylization, a novel task that stylizes a concept while preserving its fidelity, and propose a method to achieve this effectively. To mitigate overfitting, we propose a two-stage optimization strategy with customized timestep distributions, where a layer-wise LoRA adapter and a special token are jointly optimized to progressively capture coarse-grained and semantic concept features. For effective feature fusion, we propose a training-free architecture to stylize the learned concept. The style and concept features are first extracted from reference images by a pretrained image adapter. These image features together with text embedding are subsequently processed and projected into a shared space as multimodal conditions, then dynamically fused in the cross-attention layers. Moreover, our method further enhances stylization effect through element-wise subtraction, thresholding, and iterative stylization. Comprehensive experiments demonstrate the effectiveness of our method, which stylizes the concept while maintaining the balance between stylization effect and concept fidelity.
Keywords: Concept stylization, Two-stage optimization, Layer-wise LoRA, feature fusion, Multimodal conditions
Suggested Citation: Suggested Citation