The Relational Vector-Space Model
11 Pages Posted: 13 Oct 2008
Date Written: 2003
This paper addresses the classification of linked entities. Weintroduce a relational vector (VS) model (in analogy to theVS model used in information retrieval) that abstracts the linkedstructure, representing entities by vectors of weights. Givenlabeled data as background knowledge training data, classificationprocedures can be defined for this model, including astraightforward, "direct" model using weighted adjacency vectors.Using a large set of tasks from the domain of company affiliationidentification, we demonstrate that such classification procedurescan be effective. We then examine the method in more detail,showing that as expected the classification performance correlateswith the- relational auto correlation of the data set. We then turnthe tables and use the relational VS scores as a way toanalyze/visualize the relational autocorrelation present in acomplex linked structure. The main contribution of the paper 1s tointroduce the relational VS model as a potentially useful additionto the toolkit for relational data mining. It could provide usefulconstructed features for domains with low to moderate relationalautocorrelation; it may be effective by itself for domains with high levels of relational autocorrelation, and it provides a usefulabstraction for analyzing the properties of linked data.
Keywords: Relational Data Mining, Vector-space models, lndustry Classification
Suggested Citation: Suggested Citation