header

Structured Literature Image Finder: Parsing Text and Figures in Biomedical Literature

8 Pages Posted: 13 Dec 2019 Publication Status: Accepted

See all articles by Amr Ahmed

Amr Ahmed

Carnegie Mellon University - Machine Learning Department; Carnegie Mellon University - Language Technologies Institute

Andrew Arnold

Carnegie Mellon University - Machine Learning Department

Luis Pedro Coelho

Carnegie Mellon University - Joint CMU-Pitt Ph.D Program in Computational Biology (CPCB); Carnegie Mellon University - Center for Bioimage Informatics (CBI); Carnegie Mellon University - Computational Biology Department

Joshua Kangas

Carnegie Mellon University - Joint CMU-Pitt Ph.D Program in Computational Biology (CPCB); Carnegie Mellon University - Center for Bioimage Informatics (CBI); Carnegie Mellon University - Computational Biology Department

Abdul-Saboor Sheikh

Carnegie Mellon University - Center for Bioimage Informatics (CBI)

Eric Xing

Carnegie Mellon University

William W. Cohen

Carnegie Mellon University

Robert F. Murphy

Carnegie Mellon University - Machine Learning Department; Carnegie Mellon University - Joint CMU-Pitt Ph.D Program in Computational Biology (CPCB); Carnegie Mellon University - Center for Bioimage Informatics (CBI); Carnegie Mellon University - Computational Biology Department; Carnegie Mellon University - Department of Biological Sciences; Carnegie Mellon University - Department of Biomedical Engineering

Abstract

The SLIF project combines text-mining and image processing to extract structured information from biomedical literature.

SLIF extracts images and their captions from published papers. The captions are automatically parsed for relevant biological entities (protein and cell type names), while the images are classified according to their type (e.g., micrograph or gel). Fluorescence microscopy images are further processed and classified according to the depicted subcellular localization.

The results of this process can be queried online using either a user-friendly web-interface or an XML-based web-service. As an alternative to the targeted query paradigm, SLIF also supports browsing the collection based on latent topic models which are derived from both the annotated text and the image data.

Keywords: Image search, Image indexing, Topic modelling

Suggested Citation

Ahmed, Amr and Arnold, Andrew and Coelho, Luis Pedro and Kangas, Joshua and Sheikh, Abdul-Saboor and Xing, Eric and Cohen, William W. and Murphy, Robert F., Structured Literature Image Finder: Parsing Text and Figures in Biomedical Literature (March 30, 2010). Available at SSRN: https://ssrn.com/abstract=3199484 or http://dx.doi.org/10.2139/ssrn.3199484

Amr Ahmed

Carnegie Mellon University - Machine Learning Department

Gates Hillman Center
5000 Forbes Ave 8th
Pittsburgh, PA 15213-3891
United States

Carnegie Mellon University - Language Technologies Institute

5000 Forbes Avenue
Pittsburgh, PA 15213-3891
United States

Andrew Arnold

Carnegie Mellon University - Machine Learning Department

Gates Hillman Center
5000 Forbes Ave 8th
Pittsburgh, PA 15213-3891
United States

Luis Pedro Coelho (Contact Author)

Carnegie Mellon University - Joint CMU-Pitt Ph.D Program in Computational Biology (CPCB) ( email )

Pittsburgh, PA 15213
United States

Carnegie Mellon University - Center for Bioimage Informatics (CBI)

5000 Forbes Avenue
C119-122 Hamerschlag Hall
Pittsburgh, PA 15213-3891
United States

Carnegie Mellon University - Computational Biology Department

Pittsburgh, PA 15213
United States

Joshua Kangas

Carnegie Mellon University - Joint CMU-Pitt Ph.D Program in Computational Biology (CPCB)

Pittsburgh, PA 15213
United States

Carnegie Mellon University - Center for Bioimage Informatics (CBI)

5000 Forbes Avenue
C119-122 Hamerschlag Hall
Pittsburgh, PA 15213-3891
United States

Carnegie Mellon University - Computational Biology Department

Pittsburgh, PA 15213
United States

Abdul-Saboor Sheikh

Carnegie Mellon University - Center for Bioimage Informatics (CBI)

5000 Forbes Avenue
C119-122 Hamerschlag Hall
Pittsburgh, PA 15213-3891
United States

Eric Xing

Carnegie Mellon University ( email )

Pittsburgh, PA 15213-3890
United States

William W. Cohen

Carnegie Mellon University

Robert F. Murphy

Carnegie Mellon University - Machine Learning Department

Gates Hillman Center
5000 Forbes Ave 8th
Pittsburgh, PA 15213-3891
United States

Carnegie Mellon University - Joint CMU-Pitt Ph.D Program in Computational Biology (CPCB)

Pittsburgh, PA 15213
United States

Carnegie Mellon University - Center for Bioimage Informatics (CBI)

5000 Forbes Avenue
C119-122 Hamerschlag Hall
Pittsburgh, PA 15213-3891
United States

Carnegie Mellon University - Computational Biology Department

Pittsburgh, PA 15213
United States

Carnegie Mellon University - Department of Biological Sciences

4400 Fifth Avenue
Pittsburgh, PA 15213
United States

Carnegie Mellon University - Department of Biomedical Engineering

5000 Forbes Avenue
Scott Hall 4N201
Pittsburgh, PA 15213-3891
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
40
Abstract Views
583
PlumX Metrics