Optimized Cross Alignment Based Multimodal Radiology Report Summarization
80 Pages Posted: 8 Apr 2025
Abstract
Multimodal Radiology Report Summarization (MRRS) aims to summarize the radiology report text with the assistance of paired images. This inclusion of vision has already proven its importance and improvements over text-only summarization approaches. We realized that local and global features of the modalities and their alignments play a pivotal role in forming a better couple and thus improving the summarization. Given a pair of images (PA+LAT) and the report text, we have used radiology-specific knowledge, sourced and verified by radiologists, as the marginal representative text and the concatenated image as the marginal representative image. Then we optimized the image-text local-global joint representation and injected them into a transformer encoder-decoder model to improve the summary text generation. Our intermediate fusion approach reflects a minimum improvement over the SOTA text-only and multi-modal approaches for the Open-I dataset, augmented to 6,569 samples, not only in METEOR, SPICE, and sacreBLEU by 3%, 5.44%, and 4.5% respectively in terms of generation, but also in ROUGE-(F1), ROUGE-2(F1), ROUGE-L(F1) and BERTScore of 8%, 2%, 1.15% and 0.04% for summarization. Our model also reflects good qualitative evidence as compared to the baselines.
Keywords: Radiology Report Summarization, Biomedical Text Summarization, Abstractive text summarization, Medical Information Fusion, Cross-modal Alignment, Fusion Optimization
Suggested Citation: Suggested Citation