Sast: Semantic-Aware Stylized Text-to-Image Generation
15 Pages Posted: 12 Feb 2025
Abstract
The pre-trained text-to-image diffusion probabilistic model has achieved excellent quality, showing users good visual effects and attracting many users to use creative text to control the generated results. For users' detailed generation requirements, using reference images to "stylize" text-to-image is more common because they cannot be fully explained in limited language. However, there is a style deviation between the images generated by existing methods and the style reference images, contrary to the human perception that similar semantic object regions in two images with the same style should share style. To solve this problem, this paper proposes a semantic-aware style transfer method (SAST) to strengthen the semantic-level style alignment between the generated image and style reference image. First, we lead language-driven semantic segmentation trained on the COCO dataset into a general style transfer model to capture the mask that the text in the style reference image focuses on.
Keywords: Computing methodologies, Artificial Intelligence, Computer Vision, Computer vision representations, Image representationse
Suggested Citation: Suggested Citation