A Learning Theory Framework for Association Rules and Sequential Events

47 Pages Posted: 21 Jun 2011

See all articles by Cynthia Rudin

Cynthia Rudin

Duke University - Department of Computer Science

Benjamin Letham

Massachusetts Institute of Technology (MIT)

Eugene Kogan

Sourcetone

David Madigan

Columbia University - Department of Statistics

Date Written: June 20, 2011

Abstract

We present a framework and generalization analysis for the use of association rules in the setting of supervised learning. We are specifically interested in a sequential event prediction problem where data are revealed one by one, and the goal is to determine what will next be revealed. In the context of this problem, algorithms based on association rules have a distinct advantage over classical statistical and machine learning methods; however, to our knowledge there has not previously been a theoretical foundation established for using association rules in supervised learning. We present two simple algorithms that incorporate association rules. These algorithms can be used both for sequential event prediction and for supervised classification. We provide generalization guarantees on these algorithms based on algorithmic stability analysis from statistical learning theory. We include a discussion of the strict minimum support threshold often used in association rule mining, and introduce an "adjusted confidence" measure that provides a weaker minimum support condition that has advantages over the strict minimum support. The paper brings together ideas from statistical learning theory, association rule mining and Bayesian analysis.

Suggested Citation

Rudin, Cynthia and Letham, Benjamin and Kogan, Eugene and Madigan, David, A Learning Theory Framework for Association Rules and Sequential Events (June 20, 2011). Available at SSRN: https://ssrn.com/abstract=1868446 or http://dx.doi.org/10.2139/ssrn.1868446

Cynthia Rudin (Contact Author)

Duke University - Department of Computer Science ( email )

LSRC Building
Durham, NC 27708-0204
United States

Benjamin Letham

Massachusetts Institute of Technology (MIT) ( email )

77 Massachusetts Avenue
50 Memorial Drive
Cambridge, MA 02139-4307
United States

Eugene Kogan

Sourcetone ( email )

New York, NY
United States

David Madigan

Columbia University - Department of Statistics ( email )

Mail Code 4403
New York, NY 10027
United States

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
178
Abstract Views
1,368
Rank
307,587
PlumX Metrics