header

Conceptual Provenance in Indexing Languages

7 Pages Posted: 17 Aug 2018 Publication Status: Under Review

See all articles by Joseph Tennis

Joseph Tennis

University of Washington - Information School

Abstract

Indexing languages are tools used in the aid of information retrieval and sense-making. They comprise classification schemes, thesauri, ontologies, and taxonomies. Contemporary notable examples include category system of Wikipedia, Library of Congress Subject Headings used by libraries around the world, and the Gene Ontology used by scientists to understand genomes of the world. These tools are constructed at one point in time and good tools are informed by the literature and users at that time. As more literature is added to the collection represented, and as users’ needs change, so too do indexing languages. This causes a shift in structure and semantics in the indexing language. For example, in the 1913 the Dewey Decimal Classification (DDC), number 397 was the single address for GYPSIES, NOMADS, AND OUTCAST RACES defined as: “[p]eople without nationalities who do not coalesce with the ruling people among whom they live. This includes Gypsy language, which has no place in the linguistic groups of 400, as the Gypsy people have no place in the geographic divisions of history,”1 (Dewey, 1913). Both the language and the people (their culture, customs, contemporary socio-political situations) are no longer at that address, and have not been since 1958. They are handled by different numbers from a different part of the classification scheme. This phenomenon, while rich with examples from Dewey because of its age, is not the only indexing language that changes. The Wikipedia category system is another example. The Wikipedia category system is only eight years old, but it has changed dramatically. Dbpedia has captured snapshots of the provenance of categories in the Wikipedia category system. From our preliminary data analysis of their data we can see that from 2008 to 2012 there has been nearly a 170% increase in the number of categories and and 200% increase in the density of interconnections between those categories. The next step is to investigate the semantics of these changes. Because change in indexing languages is a persistent phenomenon, and there is no commonly accepted design amelioration, it constitutes an important research area in my field of knowledge organization. We need to design for change, and we need to understand the phenomenon to have an informed design pattern.

Keywords: indexing languages

Suggested Citation

Tennis, Joseph, Conceptual Provenance in Indexing Languages (2016). Tennis, Joseph T. (2016). "Conceptual Provenance in Indexing Languages." In Vicki Lemieux (ed.). Building Trust in Information: Perspectives on the Frontiers of Provenance. Springer Proceedings in Business and Economics (IEEE). 93-99. , Available at SSRN: https://ssrn.com/abstract=3225428

Joseph Tennis (Contact Author)

University of Washington - Information School ( email )

Seattle, WA 98195
United States

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
22
Abstract Views
419
PlumX Metrics