MFUnetr: A Transformer-Based Multi-Task Learning Network for Multi-Organ Segmentation from Partially Labeled Datasets
22 Pages Posted: 3 Nov 2022
Abstract
As multi-organ segmentation of CT images is crucial for clinical applications, most state-of-the-art models rely on a fully annotated dataset with strong supervision to pursue higher accuracy. However, these models have weak generalization when applied to various CT images due to the small scale and single source of training data. To utilize existing partially labeled datasets to obtain full organ segmentation and improve accuracy and robustness, we create a transformer-based multi-task learning network called MFUnetr. By directly feeding a union of datasets, MFUnetr trains an encoder-decoder network on two tasks in parallel. The main task is to produce full organ segmentation by using a particular training strategy. The auxiliary task is to segment organs of each dataset by using labels prior. Additionally, we offer a new weighted combined loss function to optimize the model. Compared to the base model trained on the fully annotated dataset BTCV, our network model, trained on a combination of three datasets, achieved mean Dice on overlapping organs: spleen +2.5%, esophagus +8.9%, and aorta +0.5%. The generalization ability was enhanced, with spleen +4.1%, esophagus +37.4%, and aorta +21.4%. Importantly, without fine-tuning, the mean Dice calculated on 13 organs of BTCV remained +0.6% when all 15 organs were segmented. Experimental results show that our proposed method can effectively use the large existing partially annotated datasets to alleviate the problem of data hunger in multi-organ segmentation.
Note:
Funding Information: This work was supported by the National Natural Science Foundation of China under Grant (No. 62162058).
Conflict of Interests: There is no conflict of interest in this work. We declare that we have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Keywords: Partially labeled dataset, 3D CT image segmentation, Multi-task learning, Multi-organ segmentation, Vision Transformer
Suggested Citation: Suggested Citation