TVPNet: Text-Vision Prompt Guided Segmentation for Small 3D Medical Object
15 Pages Posted: 31 Jan 2025
Abstract
Accurate segmentation of small organs in 3D CT images remains a significant challenge due to their small volume, indistinct boundaries, susceptibility to artifacts, and semantic similarity to surrounding tissues. In this study, a text-vision prompt-guided model (TVPNet) is proposed for small target segmentation in 3D medical images, which is designed upon the SAM framework, integrating a CNN-Transformer image encoder for efficient feature extraction. A pseudo-label decoder structure is also introduced to enhance the model's robustness in segmentation under various prompt input conditions. For small organ segmentation, visual prompts are used to provide explicit region information related to targets, effectively minimizing background interference, while text prompts are employed to decouple the semantics of different organs, improving the model's attention on small targets. Additionally, a multi-combination optimization strategy is proposed during training to enhance the model's understanding of prompt information. By evaluating on the FLARE2023 and MSD datasets, TVPNet achieved an average Dice score of 87.60 on FLARE2023, with a score 83.37 for the segmentation of 8 small organs, surpassing the state-of-the-art baselines for small object segmentation.
Note:
Funding declaration: This work was supported in part by the National Natural Science Foundation of China under Grant (Grant No. 62406211), the Natural Science Foundation of Sichuan Province under Grants (Grant No. 2024NSFSC0654, 23NSFSC1129).
Conflict of Interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper
Keywords: 3D Medical image segmentation, small object segmentation, prompt engineering
Suggested Citation: Suggested Citation