Comparison of Generality Based Algorithm Variants for Automatic Taxonomy Generation

8 Pages Posted: 25 Sep 2009

See all articles by Andreas Henschel

Andreas Henschel

Masdar Institute of Science and Technology (MIST)

Wei Lee Woon

Masdar Institute of Science and Technology (MIST)

Thomas Wachter

Dresden University of Technology

Stuart Madnick

Massachusetts Institute of Technology (MIT) - Sloan School of Management

Date Written: September 24, 2009

Abstract

We compare a family of algorithms for the automatic generation of taxonomies by adapting the Heymannalgorithm in various ways. The core algorithm determines the generality of terms and iteratively inserts them in a growing taxonomy. Variants of the algorithm are created by altering the way and the frequency, generality of terms is calculated. We analyse the performance and the complexity of the variants combined with a systematic threshold evaluation on a set of seven manually created benchmark sets. As a result, betweenness centrality calculated on unweighted similarity graphs often performs best but requires threshold fine-tuning and is computationally more expensive than closeness centrality. Finally, we show how an entropy-based filter can lead to more precise taxonomies.

Suggested Citation

Henschel, Andreas and Woon, Wei Lee and Wachter, Thomas and Madnick, Stuart E., Comparison of Generality Based Algorithm Variants for Automatic Taxonomy Generation (September 24, 2009). MIT Sloan Research Paper No. 4758-09, Available at SSRN: https://ssrn.com/abstract=1478201 or http://dx.doi.org/10.2139/ssrn.1478201

Andreas Henschel (Contact Author)

Masdar Institute of Science and Technology (MIST) ( email )

MASDAR
PO Box 54115
Abu Dhabi
United Arab Emirates

Wei Lee Woon

Masdar Institute of Science and Technology (MIST) ( email )

MASDAR
PO Box 54115
Abu Dhabi
United Arab Emirates

Thomas Wachter

Dresden University of Technology

Helmholtzstr. 10
Dresden, 01069
Germany

Stuart E. Madnick

Massachusetts Institute of Technology (MIT) - Sloan School of Management ( email )

E53-321
Cambridge, MA 02142
United States
617-253-6671 (Phone)
617-253-3321 (Fax)

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
86
Abstract Views
7,219
Rank
495,848
PlumX Metrics