puc-header

Information Retrieval with Root- and Rule-Based Terms

9 Pages Posted: 6 Apr 2020 Publication Status: Review Complete

See all articles by Jacob Collard

Jacob Collard

Cornell University

Talapady N. Bhat

National Institute of Standards and Technology (NIST)

John Elliott

National Institute of Standards and Technology (NIST)

Ram Sriram

National Institute of Standards and Technology (NIST)

Ira Monarch

Independent

Eswaran Subrahmanian

National Institute of Standards and Technology (NIST)

More...

Abstract

Root- and rule-based terms are structured representations of natural language phrases that can be automatically generated using a combination of statistical and symbolic methods. These terms are able to represent and normalize syntactic information about natural language phrases, making them richer than basic n-grams while greatly reducing the vocabulary size. In this paper, we discuss the use of root- and rule-based terms for information retrieval. We represent documents and queries as collections of root- and rule-based terms and show that this improves conventional information retrieval methods such as Latent Semantic Indexing and Latent Direchlet Allocation. Root- and rule-based terms improve on state of the art evaluation scores for the TREC 2016 clinical decision support track.

Keywords: artificial intelligence, data science, taxonomy, unsupervised learning, information retrieval, automatic terminology generation

Suggested Citation

Collard, Jacob and Bhat, Talapady N. and Elliott, John and Sriram, Ram and Monarch, Ira and Subrahmanian, Eswaran, Information Retrieval with Root- and Rule-Based Terms. Available at SSRN: https://ssrn.com/abstract=3565983 or http://dx.doi.org/10.2139/ssrn.3565983
This version of the paper has not been formally peer reviewed.

Jacob Collard (Contact Author)

Cornell University ( email )

616 Thurston Ave
Ithaca, NY 14853
United States

Talapady N. Bhat

National Institute of Standards and Technology (NIST) ( email )

Gaithersburg, MD 20899-8910
United States

John Elliott

National Institute of Standards and Technology (NIST) ( email )

Gaithersburg, MD 20899-8910
United States

Ram Sriram

National Institute of Standards and Technology (NIST) ( email )

Gaithersburg, MD 20899-8910
United States

Ira Monarch

Independent

Eswaran Subrahmanian

National Institute of Standards and Technology (NIST) ( email )

Gaithersburg, MD 20899-8910
United States

Click here to go to Cell.com

Paper statistics

Downloads
34
Abstract Views
725
PlumX Metrics