Improved Accuracy Metrics for Classification with Imbalanced Data and Where Distance from the Truth Matters, with the Wconf R Package
18 Pages Posted: 29 Apr 2024
Date Written: April 21, 2024
Abstract
The ever-wider usage of machine learning techniques in data analysis workflows highlights the need for accuracy metrics that capture various features of the dataset, such as imbalances between categories or the existence of classification problems where distance from the correct class is an important criterion in identifying model performance. The “wconf” package is capable of applying different weighting schemes to a confusion matrix based on an arithmetic sequence, geometric progression, normal distribution, sine and hyperbolic tangent function, as well as custom user-specified weight matrices. The package allows for a richer model performance analysis, in particular, for multi-class classification problems, by enabling customization of accuracy scores to assign importance to predictions situated in the vicinity of the matrix diagonal. Although designed with weighting confusion matrices in mind, the included functions can be used to weigh any type of matrix where the user wishes to assign maximal importance to values contained on the diagonal and varying degrees of decaying significance to off-diagonal elements. Furthermore, the package provides functions to generate metrics that assess classification performance on imbalanced data, providing robust and efficient alternatives to the standard accuracy indicator.
Keywords: weighted confusion matrix, model performance evaluation, penalty, distance measure, data visualization, imbalanced data, accuracy metrics, R package
JEL Classification: C51, C63, C87, E27
Suggested Citation: Suggested Citation