Optimizing Tax Administration Policies with Machine Learning
27 Pages Posted: 11 Mar 2020
Date Written: March 11, 2020
Tax authorities around the world are increasingly employing data mining and machine learning algorithms to predict individual behaviour. Although the traditional literature on optimal tax adminis- tration provides useful tools for ex-post evaluation of policies, it dis- regards the problem of which taxpayers to target. This study identifies and characterises a loss function that assigns a social cost to any prediction-based policy. We define such measure as the difference between the social welfare of a given policy and that of an ideal pol- icy unaffected by prediction errors. We show how this loss function shares a relationship with the receiver operating characteristic curve, a standard statistical tool used to evaluate prediction performance. Subsequently, we apply our measure to predict inaccurate tax returns issued by self-employed and sole proprietorships in Italy. In our ap- plication, a random forest model provides the best prediction: we show how it can be interpreted using measures of variable importance developed in the machine learning literature.
Keywords: policy prediction problems, tax behavior, big data, machine learning
JEL Classification: H26, H32, C53
Suggested Citation: Suggested Citation