Application of Classification Algorithms for the Assessment of Confirmation to Quality Remarks
27 Pages Posted: 30 Jul 2021
Date Written: July 29, 2021
In the context of the data quality management of supervisory banking data, the Bank of Italy receives a significant number of data reports at various intervals from Italian banks. If any anomalies are found, a quality remark is sent back, questioning the data submitted. This process can lead to the bank in question confirming or revising the data it previously transmitted. We propose an innovative methodology, based on text mining and machine learning techniques, for the automatic processing of the data confirmations received from banks. A classification model is employed to predict whether these confirmations should be accepted or rejected based on the reasons provided by the reporting banks, the characteristics of the validation quality checks, and reporting behaviour across the banking system. The model was trained on past cases already labelled by data managers and its performance was assessed against a set of cross-checked cases that were used as gold standard. The empirical findings show that the methodology predicts the correct decisions on recurrent data confirmations and that the performance of the proposed model is comparable to that of data managers currently engaged in data analysis.
Keywords: supervisory banking data, data quality management, machine learning, text mining, latent dirichlet allocation, gradient boosting
JEL Classification: C18, C81, G21
Suggested Citation: Suggested Citation