Sast: Semantic-Aware Stylized Text-to-Image Generation

SUN, XINYUE; ke, yongzhen; Guo, Jing; Yang, Shuai; Wang, Kai

doi:10.2139/ssrn.5134301

Download This Paper

Open PDF in Browser

Add Paper to My Library

Sast: Semantic-Aware Stylized Text-to-Image Generation

15 Pages Posted: 12 Feb 2025

See all articles by XINYUE SUN

The pre-trained text-to-image diffusion probabilistic model has achieved excellent quality, showing users good visual effects and attracting many users to use creative text to control the generated results. For users' detailed generation requirements, using reference images to "stylize" text-to-image is more common because they cannot be fully explained in limited language. However, there is a style deviation between the images generated by existing methods and the style reference images, contrary to the human perception that similar semantic object regions in two images with the same style should share style. To solve this problem, this paper proposes a semantic-aware style transfer method (SAST) to strengthen the semantic-level style alignment between the generated image and style reference image. First, we lead language-driven semantic segmentation trained on the COCO dataset into a general style transfer model to capture the mask that the text in the style reference image focuses on.

Keywords: Computing methodologies, Artificial Intelligence, Computer Vision, Computer vision representations, Image representationse

Suggested Citation: Suggested Citation

SUN, XINYUE and ke, yongzhen and Guo, Jing and Yang, Shuai and Wang, Kai, Sast: Semantic-Aware Stylized Text-to-Image Generation. Available at SSRN: https://ssrn.com/abstract=5134301 or http://dx.doi.org/10.2139/ssrn.5134301