How Informative Is the Text of Securities Complaints?

Journal of Law, Economics, & Organization (Forthcoming)

44 Pages Posted: 24 May 2021 Last revised: 9 May 2022

See all articles by Adam B. Badawi

Adam B. Badawi

University of California, Berkeley - School of Law

Date Written: November 30, 2021


Much of the research in law and finance reduces long, complex texts down to a small number of variables. Examples include the coding of corporate charters as an entrenchment index or characterizing dense securities complaints by using variables that capture the amount at issue, the statutes alleged to have been violated, and the presence of an SEC investigation. Legal scholars have often voiced concerns that this type of dimensionality reduction loses much of the nuance and detail that is embedded in legal text. This paper assesses this critique by asking whether methods that can analyze text are able to capture meaningful–and perhaps even more–information than traditional low-dimension studies that rely on non-textual inputs. It does so by applying text analysis and machine learning to a corpus of more than five thousand complaints filed in private securities class actions that collectively contain over 90 million words. This analysis shows that there is significant information embedded in the text of these complaints, albeit with substantial limitations on how much information that text analysis can extract.

The analysis proceeds in three parts. The first asks whether the text provides indications about the eventual outcomes in the cases. The best performing models predict whether cases will settle or get dismissed with an accuracy rate of about 70 percent. That is substantially better than baseline rates, but still leaves significant room for improvement. The second part of the analysis compares text-based models to non-text models and assesses their relative performance in predicting outcomes. While the best performing text-based models are more accurate than the best performing non-text models, a hybrid model that uses both text and non-text components performs better than either of these two components alone. These results suggest that there may be some information omitted from the non-text models and that augmenting them with textual information may improve them. Finally, the analysis uses abnormal returns as an additional means of validation. Previous research shows that there are substantial differences in the abnormal returns of cases that will get dismissed and those that will settle in the days following the filing of a securities lawsuit. This section replicates this result and then shows that the predictions made by the machine learning models are associated with substantial abnormal returns. While market participants take about three or four days to settle on the likely outcome of a case on stock price, the machine learning models can make these predictions more or less instantaneously. In addition, to validating the predictions against human judgment, these results also suggest that there is some stock price drift in the reactions to the complexities of securities lawsuits.

Keywords: Securities, Machine Learning, Text Analysis, Event Studies

JEL Classification: K22, G14

Suggested Citation

Badawi, Adam B., How Informative Is the Text of Securities Complaints? (November 30, 2021). Journal of Law, Economics, & Organization (Forthcoming), Available at SSRN: or

Adam B. Badawi (Contact Author)

University of California, Berkeley - School of Law ( email )

215 Boalt Hall
Berkeley, CA 94720-7200
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
PlumX Metrics