A Decision-making Rule to Detect Insufficient Data Quality: An Application of Statistical Learning Techniques to the Non-performing Loans Banking Data?
29 Pages Posted: 14 Feb 2022
Date Written: February 2, 2022
The paper presents a decision-making rule, based on statistical learning techniques, to evaluate and monitor the overall quality of the granular dataset referring to the Non-Performing Loans data collection carried out by the Bank of Italy. The datasets submitted by the reporting agents must display a sufficiently high level of quality before their release to users. The study defines a decision-making rule to distinguish the cases where the corrections applied to the original dataset improve its overall quality from those where the revisions (unexpectedly) make it worse. The decision-making rule is based on a new synthetic data quality indicator, based on past evidence accumulated on data quality management activity, which makes possible the assessment and monitoring of the overall quality of the Non-Performing Loans dataset. The proposed indicator takes into account different metrics that influence the overall quality of the dataset, specifically the number of remarks (potential outliers) detected by the Bank of Italy’s internal procedures, their degree of severity and the expected number of confirmations of underlying data, the latter based on the estimation provided by the logistic regression model.
Keywords: potential outliers, non-performing loans, data quality, supervised machine learning, logistic regression
JEL Classification: C18, C81, G21
Suggested Citation: Suggested Citation