Comparative Robustness Analysis of the High Dimensional Consensus Score Versus Machine Learning Strategies for Mass Spectra

Zhong, Wencheng; Usha, Amudhan; Capistran, Briana; Kearsley, Anthony

doi:10.2139/ssrn.5186809

Download This Paper

Open PDF in Browser

Add Paper to My Library

Comparative Robustness Analysis of the High Dimensional Consensus Score Versus Machine Learning Strategies for Mass Spectra

9 Pages Posted: 20 Mar 2025

See all articles by Wencheng Zhong

Anthony Kearsley

National Institute of Standards and Technology (NIST) - Applied and Computational Mathematics Division

Abstract

Mass spectrometry is widely used for compound identification, yet the complexity of fragmentation and measurement variability poses challenges in distinguishing compounds from single mass spectrum profiles alone. Most forensic laboratories employ gas chromatography electron ionization mass spectrometry (GC-EI-MS) for, at least, part of their identification process when analyzing seized substances suspected of being illegal drugs. Typically, a single measurement (or mass spectrum) is produced when seeking to investigate an analyte, but we examine the situation when multiple measurements are acquired, and we employ the High Dimensional Consensus (HDC) approach to compound discrimination. Using a dataset of replicate gas chromatography electron ionization mass spectra (GC-EI-MS) measurements of several compounds of forensic interest, we conduct a comparison study of the HDC algorithm and several traditional machine learning and simple deep learning classification models (logistic regression, random forest, XGBoosting, multilayer perceptron). The HDC effectively distinguishes structurally similar compounds, such as methamphetamine and phentermine, making it a valuable tool for measuring the similarity between mass spectra of closely related compounds. We further evaluate the performance of these classification methods under the influence of two noise models, representing measurement errors in the measured intensities and detected peaks respectively. For both noise models, we find that the HDC algorithm outperforms most machine learning models in the low-to-moderate noise range (i.e. when the variance of the noise is < 2% of base peak intensity), with a sharp drop in performance under the effect of higher noise for the intensity-varied spectra.

Keywords: High dimensional consensus, Machine learning, Robustness analysis, mass spectrometry

Suggested Citation: Suggested Citation

Zhong, Wencheng and Usha, Amudhan and Capistran, Briana and Kearsley, Anthony, Comparative Robustness Analysis of the High Dimensional Consensus Score Versus Machine Learning Strategies for Mass Spectra. Available at SSRN: https://ssrn.com/abstract=5186809 or http://dx.doi.org/10.2139/ssrn.5186809