50 Pages Posted: 21 May 2018 Last revised: 6 Aug 2018
Date Written: May 9, 2018
The extensive literature on financing innovation pinpoints numerous variables that influence patent activity. Yet, the simultaneous discovery of these covariates makes it difficult to interpret this body of research. We use machine learning techniques to assess the incremental explanatory power of previously identified innovation covariates, seeking to identify the characteristics that provide material, independent information about patents and citations. We find that only seven out of thirty-five formerly documented variables independently explain patents or their citations; managerial and governance characteristics rarely survive the selection process. Cross-validation demonstrates the stability of these covariates across different variable selection methods, highlighting that stock liquidity consistently provides robust explanatory power. We also shed light on whether commonly used econometric techniques, specifically the inclusion of industry or firm fixed effects, mitigate the need to include the key identified variables. Remarkably, relying on industry or firm fixed-effects does not nullify the need to incorporate these key innovation covariates. Furthermore, due to the difficulties of identifying the appropriate covariates, evaluating the exclusion restriction in studies that rely on exogenous shocks to provide causal evidence on corporate innovation is quite challenging. Our analysis offers guidance for testing the exclusion restriction in studies that propose or evaluate the causal features of innovation.
Keywords: variable selection, machine learning, lasso, innovation, patents, citations
JEL Classification: O30, G30, G32, O34
Suggested Citation: Suggested Citation