Institutional Sector Cassifier, a Machine Learning Approach

32 Pages Posted: 28 May 2020

Date Written: March 18, 2020

Abstract

We implement machine learning techniques to obtain an automatic classification by sector of economic activity of the Italian companies recorded in the Bank of Italy Entities Register. To this end, first we extract a sample of correctly classified corporations from the universe of Italian companies. Second, we select a set of features that are related to the sector of economic activity code and use these to implement supervised approaches to infer output predictions. We choose a multi-step approach based on the hierarchical structure of the sector classification. Because of the imbalance in the target classes, at each step, we first apply two resampling procedures – random oversampling and the Synthetic Minority Over-sampling Technique – to get a more balanced training set. Then, we fit Gradient Boosting and Support Vector Machine models. Overall, the performance of our multi-step classifier yields very reliable predictions of the sector code. This approach can be employed to make the whole classification process more efficient by reducing the area of manual intervention.

Keywords: machine learning, entities register, classification by institutional sector

JEL Classification: C18, C81, G21

Suggested Citation

Massaro, Paolo and Vannini, Ilaria and Giudice, Oliver, Institutional Sector Cassifier, a Machine Learning Approach (March 18, 2020). Bank of Italy Occasional Paper No. 548, Available at SSRN: https://ssrn.com/abstract=3612710 or http://dx.doi.org/10.2139/ssrn.3612710

Paolo Massaro (Contact Author)

Bank of Italy ( email )

Via Nazionale 91
Rome, 00184
Italy

Ilaria Vannini

Bank of Italy ( email )

Via Nazionale 91
Rome, 00184
Italy

Oliver Giudice

Bank of Italy ( email )

Via Nazionale 91
Rome, 00184
Italy

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
14
Abstract Views
120
PlumX Metrics