Acora: Distribution-Based Aggregation for Relational Learning from Identifier Attributes

46 Pages Posted: 9 Oct 2008

See all articles by Claudia Perlich

Claudia Perlich

IBM Corporation - Thomas J. Watson Research Center

Foster Provost

New York University

Date Written: February 2005

Abstract

Feature construction through aggregation plays an essential role in modeling relationaldomains with one-to-many relationships between tables. One-to-many relationshipslead to bags (multisets) of related entities, from which predictive informationmust be captured. This paper focuses on aggregation from categorical attributesthat can take many values (e.g., object identifiers). We present a novel aggregationmethod as part of a relational learning system ACORA, that combines the use ofvector distance and meta-data about the class-conditional distributions of attributevalues. We provide a theoretical foundation for this approach deriving a "relationalfixed-effect" model within a Bayesian framework, and discuss the implications ofidentifier aggregation on the expressive power of the induced model. One advantageof using identifier attributes is the circumvention of limitations caused either bymissing/unobserved object properties or by independence assumptions. Finally, weshow empirically that the novel aggregators can generalize in the presence of identi-fier (and other high-dimensional) attributes, and also explore the limitations of theapplicability of the methods.

Suggested Citation

Perlich, Claudia and Provost, Foster, Acora: Distribution-Based Aggregation for Relational Learning from Identifier Attributes (February 2005). Information Systems Working Papers Series, Vol. , pp. -, 2005. Available at SSRN: https://ssrn.com/abstract=1281315

Claudia Perlich (Contact Author)

IBM Corporation - Thomas J. Watson Research Center ( email )

Route 134
Kitchawan Road
Yorktown Heights, NY 10598
United States

Foster Provost

New York University ( email )

44 West Fourth Street
New York, NY 10012
United States

Register to save articles to
your library

Register

Paper statistics

Downloads
29
Abstract Views
469
PlumX Metrics