TVPNet: Text-Vision Prompt Guided Segmentation for Small 3D Medical Object

15 Pages Posted: 31 Jan 2025

See all articles by Yuhua Xie

Yuhua Xie

Sichuan University

Rui Wang

affiliation not provided to SSRN

Yan Zhuang

Sichuan University

Ke Chen

Sichuan University

Lin Han

Sichuan University

Guoliang Liao

Sichuan University

Yao Hou

Sichuan University

Lingxuan Hou

Sichuan University

Jiangli Lin

Sichuan University

Abstract

Accurate segmentation of small organs in 3D CT images remains a significant challenge due to their small volume, indistinct boundaries, susceptibility to artifacts, and semantic similarity to surrounding tissues. In this study, a text-vision prompt-guided model (TVPNet) is proposed for small target segmentation in 3D medical images, which is designed upon the SAM framework, integrating a CNN-Transformer image encoder for efficient feature extraction. A pseudo-label decoder structure is also introduced to enhance the model's robustness in segmentation under various prompt input conditions. For small organ segmentation, visual prompts are used to provide explicit region information related to targets, effectively minimizing background interference, while text prompts are employed to decouple the semantics of different organs, improving the model's attention on small targets. Additionally, a multi-combination optimization strategy is proposed during training to enhance the model's understanding of prompt information. By evaluating on the FLARE2023 and MSD datasets, TVPNet achieved an average Dice score of 87.60 on FLARE2023, with a score 83.37 for the segmentation of 8 small organs, surpassing the state-of-the-art baselines for small object segmentation.

Note:
Funding declaration: This work was supported in part by the National Natural Science Foundation of China under Grant (Grant No. 62406211), the Natural Science Foundation of Sichuan Province under Grants (Grant No. 2024NSFSC0654, 23NSFSC1129).

Conflict of Interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper

Keywords: 3D Medical image segmentation, small object segmentation, prompt engineering

Suggested Citation

Xie, Yuhua and Wang, Rui and Zhuang, Yan and Chen, Ke and Han, Lin and Liao, Guoliang and Hou, Yao and Hou, Lingxuan and Lin, Jiangli, TVPNet: Text-Vision Prompt Guided Segmentation for Small 3D Medical Object. Available at SSRN: https://ssrn.com/abstract=5113668 or http://dx.doi.org/10.2139/ssrn.5113668

Yuhua Xie

Sichuan University ( email )

No. 24 South Section1, Yihuan Road,
Chengdu, 610064
China

Rui Wang

affiliation not provided to SSRN ( email )

No Address Available

Yan Zhuang

Sichuan University ( email )

No. 24 South Section1, Yihuan Road,
Chengdu, 610064
China

Ke Chen

Sichuan University ( email )

No. 24 South Section1, Yihuan Road,
Chengdu, 610064
China

Lin Han

Sichuan University ( email )

No. 24 South Section1, Yihuan Road,
Chengdu, 610064
China

Guoliang Liao

Sichuan University ( email )

No. 24 South Section1, Yihuan Road,
Chengdu, 610064
China

Yao Hou

Sichuan University ( email )

No. 24 South Section1, Yihuan Road,
Chengdu, 610064
China

Lingxuan Hou

Sichuan University ( email )

No. 24 South Section1, Yihuan Road,
Chengdu, 610064
China

Jiangli Lin (Contact Author)

Sichuan University ( email )

No. 24 South Section1, Yihuan Road,
Chengdu, 610064
China

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
59
Abstract Views
166
Rank
775,974
PlumX Metrics