Comparative Robustness Analysis of the High Dimensional Consensus Score Versus Machine Learning Strategies for Mass Spectra

9 Pages Posted: 20 Mar 2025

See all articles by Wencheng Zhong

Wencheng Zhong

Johns Hopkins University

Amudhan Usha

affiliation not provided to SSRN

Briana Capistran

affiliation not provided to SSRN

Anthony Kearsley

National Institute of Standards and Technology (NIST) - Applied and Computational Mathematics Division

Abstract

Mass spectrometry is widely used for compound identification, yet the complexity of fragmentation and measurement variability poses challenges in distinguishing compounds from single mass spectrum profiles alone. Most forensic laboratories employ gas chromatography electron ionization mass spectrometry (GC-EI-MS) for, at least, part of their identification process when analyzing seized substances suspected of being illegal drugs. Typically, a single measurement (or mass spectrum) is produced when seeking to investigate an analyte, but we examine the situation when multiple measurements are acquired, and we employ the High Dimensional Consensus (HDC) approach to compound discrimination. Using a dataset of replicate gas chromatography electron ionization mass spectra (GC-EI-MS) measurements of several compounds of forensic interest, we conduct a comparison study of the HDC algorithm and several traditional machine learning and simple deep learning classification models (logistic regression, random forest, XGBoosting, multilayer perceptron). The HDC effectively distinguishes structurally similar compounds, such as methamphetamine and phentermine, making it a valuable tool for measuring the similarity between mass spectra of closely related compounds. We further evaluate the performance of these classification methods under the influence of two noise models, representing measurement errors in the measured intensities and detected peaks respectively. For both noise models, we find that the HDC algorithm outperforms most machine learning models in the low-to-moderate noise range (i.e. when the variance of the noise is < 2% of base peak intensity), with a sharp drop in performance under the effect of higher noise for the intensity-varied spectra.

Keywords: High dimensional consensus, Machine learning, Robustness analysis, mass spectrometry

Suggested Citation

Zhong, Wencheng and Usha, Amudhan and Capistran, Briana and Kearsley, Anthony, Comparative Robustness Analysis of the High Dimensional Consensus Score Versus Machine Learning Strategies for Mass Spectra. Available at SSRN: https://ssrn.com/abstract=5186809 or http://dx.doi.org/10.2139/ssrn.5186809

Wencheng Zhong

Johns Hopkins University ( email )

Baltimore, MD 20036-1984
United States

Amudhan Usha

affiliation not provided to SSRN ( email )

No Address Available

Briana Capistran

affiliation not provided to SSRN ( email )

No Address Available

Anthony Kearsley (Contact Author)

National Institute of Standards and Technology (NIST) - Applied and Computational Mathematics Division ( email )

Gaithersburg, MD 20899-8910
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
13
Abstract Views
112
PlumX Metrics