A Knowledge-Informed Spectral Variable Interval Identification and Combination Based On  the Hierarchical Clustering for Robust and Interpretable Analysis

21 Pages Posted: 29 Apr 2024

See all articles by pengcheng wu

pengcheng wu

Jiangsu University

manshang wang

Jiangsu University

Tao Chen

University of Surrey

Lei Xing

University of Surrey

xiaobo zou

Jiangsu University

haoran li

Jiangsu University

Abstract

Variable selection in the context of regression problems can be considered as an optimization process. However, such data-driven methods may ignore the physically relevant variables or feature structures, which can be exploited to enhance the robustness and interpretability of variable selection results. In this paper, we propose a knowledge-informed spectral variable hierarchical-clustering and optimal interval combination (HCIC) strategy to capture and exploit underlying correlations among spectra wavelengths. In the first step, spectral variable hierarchical-clustering (SVHC) is employed to determine the correlation between adjacent variables and then generate a number of non-uniform intervals. These intervals are designed to distinguish patterns or structural regions arising from infrared-light and chemical bonds reactions, enabling the exploitation of physically relevant characteristics. In the subsequent step, a bayesian linear regression based optimal interval combination (BLR-OIC) strategy is introduced with weighted bootstrap sampling (WBS) to search for the most effective solutions. This strategy aims to emulate the synergy effect among functional bands or group functions in the spectral data. We conduct extensive experiments on public available and private databases with various spectra techniques to verify the efficacy of the proposed algorithm. The results not only manifest improved prediction performance and robustness compared to benchmark methods but also demonstrate interpretability and consistent selection results.

Keywords: Variable Interval Selection, Bayesian Linear Regression, Hierarchical-Clustering, Multivariate calibration, chemometrics

Suggested Citation

wu, pengcheng and wang, manshang and Chen, Tao and Xing, Lei and zou, xiaobo and li, haoran, A Knowledge-Informed Spectral Variable Interval Identification and Combination Based On  the Hierarchical Clustering for Robust and Interpretable Analysis. Available at SSRN: https://ssrn.com/abstract=4811165 or http://dx.doi.org/10.2139/ssrn.4811165

Pengcheng Wu

Jiangsu University ( email )

Xuefu Rd. 301
Xhenjiang, 212013
China

Manshang Wang

Jiangsu University ( email )

Xuefu Rd. 301
Xhenjiang, 212013
China

Tao Chen

University of Surrey ( email )

Guildford
Guildford, GU2 5XH
United Kingdom

Lei Xing

University of Surrey ( email )

Guildford
Guildford, GU2 5XH
United Kingdom

Xiaobo Zou

Jiangsu University ( email )

Xuefu Rd. 301
Xhenjiang, 212013
China

Haoran Li (Contact Author)

Jiangsu University ( email )

Xuefu Rd. 301
Xhenjiang, 212013
China

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
21
Abstract Views
151
PlumX Metrics