puc-header

Prompt Engineering for Transformer-Based Chemical Similarity Search Identifies Structurally Distinct Functional Analogues

42 Pages Posted: 25 May 2023 Publication Status: Published

See all articles by Clayton Walter Kosonocky

Clayton Walter Kosonocky

University of Texas at Austin - Department of Molecular Biosciences

Aaron L. Feller

University of Texas at Austin - Department of Molecular Biosciences

Claus O. Wilke

University of Texas at Austin - Department of Integrative Biology

Andrew D. Ellington

University of Texas at Austin - Department of Molecular Biosciences

More...

Abstract

Chemical similarity searches are widely used in-silico methods for identifying new drug-like molecules. These methods have historically relied on structure-based comparisons to compute molecular similarity. Here, we use a chemical language model to create a vector-based chemical search. We extend implementations by creating a prompt engineering strategy that utilizes two different chemical string representation algorithms: one for the query and the other for the database. We explore this method by reviewing the search results from five drug-like query molecules (penicillin G, nirmatrelvir, zidovudine, lysergic acid diethylamide, and fentanyl) and three dye-like query molecules (acid blue 25, avobenzone, and 2-diphenylaminocarbazole). We find that this novel method identifies molecules that are functionally similar to the query, indicated by the associated patent literature, and that many of these molecules are structurally distinct from the query, making them unlikely to be found with traditional chemical similarity search methods. This method may aid in the discovery of novel structural classes of molecules that achieve target functionality.

Keywords: drug discovery, machine learning, chemical similarity search, prompt engineering, SMILES, transformer, unsupervised, chemical language model, BERT, SARS-CoV-2

Suggested Citation

Kosonocky, Clayton Walter and Feller, Aaron L. and Wilke, Claus O. and Ellington, Andrew D., Prompt Engineering for Transformer-Based Chemical Similarity Search Identifies Structurally Distinct Functional Analogues. Available at SSRN: https://ssrn.com/abstract=4458489 or http://dx.doi.org/10.2139/ssrn.4458489
This version of the paper has not been formally peer reviewed.

Clayton Walter Kosonocky

University of Texas at Austin - Department of Molecular Biosciences ( email )

Aaron L. Feller

University of Texas at Austin - Department of Molecular Biosciences ( email )

Claus O. Wilke (Contact Author)

University of Texas at Austin - Department of Integrative Biology ( email )

Andrew D. Ellington

University of Texas at Austin - Department of Molecular Biosciences ( email )

Click here to go to Cell.com

Paper statistics

Downloads
5
Abstract Views
267
PlumX Metrics