Association Rules and Compositional Data Analysis: Implications to Big Data

10 Pages Posted: 12 Sep 2017

See all articles by Ron S. Kenett

Ron S. Kenett

KPA Ltd.; Technion-Israel Institute of Technology; Hebrew University of Jerusalem - Faculty of Medicine

Josep Martín-Fernández

University of Girona - Department of Computer Science, Applied Mathematics and Statistics

Santiago Thió-Henestrosa

University of Girona - Department of Computer Science, Applied Mathematics and Statistics

Marina Vives-Mestres

University of Girona - Department of Computer Science, Applied Mathematics and Statistics

Date Written: September 7, 2017

Abstract

Many modern organizations generate a large amount of transaction data, on a daily basis. Transactions typically include semantic descriptors that require specialised methods for analysis. Association rule (AR) mining is a powerful semantic data analytic technique used for extracting information from transaction databases and indicate what item goes with what item in a set of transactions. AR was originally developed for basket analysis where the combination of items in a shopping basket is evaluated to determine prevalence with impact of shelves layouts. To generate an AR, the collection of more frequent itemsets—a set of two of more items—must be detected. Then, as a second step, all possible ARs are generated form each itemset. The ARs are then ranked using measures of association labelled, in this context, “measures of interestingness”. The R package “arules” provides more than a dozen such measures including the relative linkage disequilibrium (RLD) which normalises classical Euclidean distances of the itemset from a surface of independence. In this work, we study AR and RLD from a compositional data (CoDa) perspective. It is well known that CoDa methodology provides nice properties such as subcompostional coherence and scalability. We explore here the implication of CoD to AR mining in big data analysis. The aim is to analyse if CoDa properties ensure that the AR characteristic is not scale dependent and that if we consider a subset of the original items, we still keep similar behaviour. The work focuses on such aspects, including the dynamic visualization of CoDa-AR measures on a simplex representation of the itemsets and its multidimensional extension.

Keywords: Aitchison geometry, text analysis, association rules, isometric logratio coordinates, itemsets, measures of interestingness, support, list, confidence, odds ratio, relative linkage disequilibrium.

Suggested Citation

Kenett, Ron S. and Martín-Fernández, Josep and Thió-Henestrosa, Santiago and Vives-Mestres, Marina, Association Rules and Compositional Data Analysis: Implications to Big Data (September 7, 2017). Available at SSRN: https://ssrn.com/abstract=3033588 or http://dx.doi.org/10.2139/ssrn.3033588

Ron S. Kenett (Contact Author)

KPA Ltd. ( email )

Raanana
Israel
+97297408442 (Phone)
+97297408443 (Fax)

HOME PAGE: http://www.kpa-group.com

Technion-Israel Institute of Technology ( email )

Technion City
Haifa 32000, Haifa 32000
Israel

HOME PAGE: http://www.neaman.org.il/EN/Ron-Kenett

Hebrew University of Jerusalem - Faculty of Medicine ( email )

Jerusalem
Israel

Josep Martín-Fernández

University of Girona - Department of Computer Science, Applied Mathematics and Statistics ( email )

Girona
Spain

Santiago Thió-Henestrosa

University of Girona - Department of Computer Science, Applied Mathematics and Statistics ( email )

Girona
Spain

Marina Vives-Mestres

University of Girona - Department of Computer Science, Applied Mathematics and Statistics ( email )

Girona
Spain

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
84
Abstract Views
520
rank
319,777
PlumX Metrics