Enhancing Information Retrieval Through Statistical Natural Language Processing: A Study of Collocation Indexing

Management Information Systems Quarterly (MISQ), Vol. 31, No. 3, 2007

University of Alberta School of Business Research Paper No. 2013-108

Posted: 26 May 2013 Last revised: 27 Jun 2013

See all articles by Ofer Arazy

Ofer Arazy

Independent

Carson Woo

University of British Columbia (UBC) - Sauder School of Business

Date Written: September 1, 2006

Abstract

Although the management of information assets-specifically, of text documents that make up 80 percent of these assets-an provide organizations with a competitive advantage, the ability of information retrieval (IR) systems to deliver relevant information to users is severely hampered by the difficulty of disambiguating natural language. The word ambiguity problem is addressed with moderate success in restricted settings, but continues to be the main challenge for general settings, characterized by large, heterogeneous document collections.

In this paper, we provide preliminary evidence for the usefulness of statistical natural language processing (NLP) techniques, and specifically of collocation indexing, for IR in general settings. We investigate the effect of three key parameters on collocation indexing performance: directionality, distance, and weighting. We build on previous work in IR to (1) advance our knowledge of key design elements for collocation indexing, (2) demonstrate gains in retrieval precision from the use of statistical NLP for general-settings IR, and, finally, (3) provide practitioners with a useful cost-benefit analysis of the methods under investigation.

Suggested Citation

Arazy, Ofer and Woo, Carson, Enhancing Information Retrieval Through Statistical Natural Language Processing: A Study of Collocation Indexing (September 1, 2006). Management Information Systems Quarterly (MISQ), Vol. 31, No. 3, 2007 , University of Alberta School of Business Research Paper No. 2013-108, Available at SSRN: https://ssrn.com/abstract=2268715

Carson Woo

University of British Columbia (UBC) - Sauder School of Business ( email )

2053 Main Mall
Vancouver, BC V6T 1Z2
Canada
604-822-8390 (Phone)

Here is the Coronavirus
related research on SSRN

Paper statistics

Abstract Views
281
PlumX Metrics