A Randomized Exhaustive Propositionalization Approach for Molecule Classification

INFORMS Journal on Computing, Volume 23, Issue 3, Summer 2011, pp. 331-345

University of Alberta School of Business Research Paper No. 2013-1099

27 Pages Posted: 2 Jul 2013 Last revised: 22 Jan 2014

See all articles by Michele Samorani

Michele Samorani

Santa Clara University - Information Systems and Analytics

Manuel Laguna

University of Colorado at Boulder - Leeds School of Business

Robert DeLisle

Array BioPharma, Inc.

Daniel Weaver

Independent

Date Written: May 26, 2010

Abstract

Drug discovery is the process of designing compounds that have desirable properties, such as activity and non-toxicity. Molecule classification techniques are used along this process to predict the properties of the compounds in order to expedite their testing. Ideally, the classification rules found should be accurate and reveal novel chemical properties, but current molecule representation techniques lead to less than adequate accuracy and knowledge discovery. This work extends the propositionalization approach recently proposed for multi-relational data mining in two ways: it generates expressive attributes exhaustively and it uses randomization to sample a limited set of complex (“deep”) attributes. Our experimental tests show that the procedure is able to generate meaningful and interpretable attributes from molecular structural data, and that these features are effective for classification purposes.

Suggested Citation

Samorani, Michele and Laguna, Manuel and DeLisle, Robert and Weaver, Daniel, A Randomized Exhaustive Propositionalization Approach for Molecule Classification (May 26, 2010). INFORMS Journal on Computing, Volume 23, Issue 3, Summer 2011, pp. 331-345 ; University of Alberta School of Business Research Paper No. 2013-1099. Available at SSRN: https://ssrn.com/abstract=2284380

Michele Samorani (Contact Author)

Santa Clara University - Information Systems and Analytics ( email )

500, El Camino Real
Santa Clara, CA 95053-0382
United States

Manuel Laguna

University of Colorado at Boulder - Leeds School of Business ( email )

Boulder, CO 80309-0419
United States

Robert DeLisle

Array BioPharma, Inc. ( email )

3200 Walnut Street
Boulder, CO 80540
United States

Daniel Weaver

Independent ( email )

No Address Available

Register to save articles to
your library

Register

Paper statistics

Downloads
14
Abstract Views
390
PlumX Metrics