TVPNet: Text-Vision Prompt Guided Segmentation for Small 3D Medical Object

Xie, Yuhua; Wang, Rui; Zhuang, Yan; Chen, Ke; Han, Lin; Liao, Guoliang; Hou, Yao; Hou, Lingxuan; Lin, Jiangli

doi:10.2139/ssrn.5113668

Download This Paper

Open PDF in Browser

Add Paper to My Library

TVPNet: Text-Vision Prompt Guided Segmentation for Small 3D Medical Object

15 Pages Posted: 31 Jan 2025

See all articles by Yuhua Xie

Accurate segmentation of small organs in 3D CT images remains a significant challenge due to their small volume, indistinct boundaries, susceptibility to artifacts, and semantic similarity to surrounding tissues. In this study, a text-vision prompt-guided model (TVPNet) is proposed for small target segmentation in 3D medical images, which is designed upon the SAM framework, integrating a CNN-Transformer image encoder for efficient feature extraction. A pseudo-label decoder structure is also introduced to enhance the model's robustness in segmentation under various prompt input conditions. For small organ segmentation, visual prompts are used to provide explicit region information related to targets, effectively minimizing background interference, while text prompts are employed to decouple the semantics of different organs, improving the model's attention on small targets. Additionally, a multi-combination optimization strategy is proposed during training to enhance the model's understanding of prompt information. By evaluating on the FLARE2023 and MSD datasets, TVPNet achieved an average Dice score of 87.60 on FLARE2023, with a score 83.37 for the segmentation of 8 small organs, surpassing the state-of-the-art baselines for small object segmentation.

Note:
Funding declaration: This work was supported in part by the National Natural Science Foundation of China under Grant (Grant No. 62406211), the Natural Science Foundation of Sichuan Province under Grants (Grant No. 2024NSFSC0654, 23NSFSC1129).

Conflict of Interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper

Keywords: 3D Medical image segmentation, small object segmentation, prompt engineering

Suggested Citation: Suggested Citation

Xie, Yuhua and Wang, Rui and Zhuang, Yan and Chen, Ke and Han, Lin and Liao, Guoliang and Hou, Yao and Hou, Lingxuan and Lin, Jiangli, TVPNet: Text-Vision Prompt Guided Segmentation for Small 3D Medical Object. Available at SSRN: https://ssrn.com/abstract=5113668 or http://dx.doi.org/10.2139/ssrn.5113668