Firm Failure Prediction Using Genetic Programming Generated Features

21 Pages Posted: 17 May 2023

See all articles by Yuriy Zelenkov

Yuriy Zelenkov

National Research University Higher School of Economics (Moscow)


Predicting firm failure attracts the attention of researchers, as this problem is important for sustainable development. Many studies in this area are devoted to finding new features that increase predictive accuracy. In this paper, genetic programming (GP) is used for this purpose. The main problem in GP is to specify a function that evaluates the fitness of the feature. Direct optimization of a machine learning (ML) model that uses a generated feature in most cases leads to high computational costs since evolving a population of N programs over G generations while evaluating each model using K-fold cross validation requires N*G*K model learning cycles. Thus, many researchers use scores that measure the relationship of the generated features to the class label. However, our empirical analysis shows that most such scores correlate poorly with ML model performance. We consider several ways of linearly combining scores. Experimental results on data from Hungarian firms (7167 observations, class imbalance 9.37) using five ML models (Logistic Regression, Random Forest, Gradient Boosting, Histogram Boosting, and AdaBoost) prove that the proposed way of setting the fitness function increases the ROC AUC of the listed models by 6.6%, 5.2%, 6.8%, 5.5% and 5.2% respectively. Moreover, by applying the found formula to the data from Czech firms (3872 observations, class imbalance of 74.92), which were not used for the feature search, we obtained increases in ROC AUC by 13.1%, 11.8%, 14.9%, 11.3%, and 8.2%, respectively. This indicates that the proposed method allows to find universal features, which opens the way to build effective models in case of "insufficient" data (small number of observations, extreme imbalance, etc.)

Keywords: firm failure prediction, genetic programming generated feature, fitness function, score of generated features

Suggested Citation

Zelenkov, Yuriy, Firm Failure Prediction Using Genetic Programming Generated Features. Available at SSRN: or

Yuriy Zelenkov (Contact Author)

National Research University Higher School of Economics (Moscow) ( email )

Myasnitskaya street, 20
Moscow, Moscow 119017

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
PlumX Metrics