Multimodal Embedding for Scientific Image Caption Generation

11 Pages Posted: 22 Apr 2025

See all articles by Jose Luis Huillca

Jose Luis Huillca

Universidade Federal Fluminense

Leandro Augusto Frata Fernandes

Universidade Federal Fluminense

Abstract

Image caption generation is a process that emerged from the combination of Computer Vision and Natural Language Processing techniques. The solution for this problem has typically been applied to enhance visual content comprehension of natural images by producing short descriptions. However, the presence of scientific images differs from that of natural images. Furthermore, scientific images are often associated with detailed descriptions in the manuscripts. This paper presents the Multimodal Image Captioning (MMICap). Our deep-learning-based solution uses multimodal inputs, consisting of scientific images paired with accompanying detailed texts, to facilitate the automatic generation of captions that effectively summarize the content depicted within these images. This paper also presents a new dataset for image captioning, ElsCap, constructed from open-access articles retrieved from ScienceDirect. ElsCap contains 1,088,728 scientific images with their respective captions and descriptive paragraphs. Experiments with the ElsCap dataset demonstrate that MMICap leverages the integration of image and text inputs to enhance the quality of generated captions. In our experiment, we used the BLEU, METEOR, ROUGE, and CIDEr metrics to compare the results produced by BLIP and LSTM networks with those produced by these networks when integrated as the backbone of the MMICap paradigm.

Keywords: Image captioning, deep learning dataset

Suggested Citation

Huillca, Jose Luis and Fernandes, Leandro Augusto Frata, Multimodal Embedding for Scientific Image Caption Generation. Available at SSRN: https://ssrn.com/abstract=5226206 or http://dx.doi.org/10.2139/ssrn.5226206

Jose Luis Huillca (Contact Author)

Universidade Federal Fluminense ( email )

Niterói
Brazil

Leandro Augusto Frata Fernandes

Universidade Federal Fluminense ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
8
Abstract Views
68
PlumX Metrics