Using Natural Language Processing to Assess Text Usefulness to Readers: The Case of Conference Calls and Earnings Prediction
53 Pages Posted: 9 Jan 2018 Last revised: 17 Jan 2018
Date Written: January 17, 2017
We examine whether support vector regressions (SVR), supervised LDA (sLDA), random forest regression trees (RF), and ‘tone’ extract narrative content from conference calls that correlates with useful information that a human reader would identify. We find that each narrative-content measure (along with a composite measure) explains a portion of analyst-forecast revisions for quarter q 1 issued after the conference call in quarter q. Correlation with analyst-forecast revisions improves when the composite measure adapts to context (positive/negative returns; high variance/low variance) and ignores sparse words. The correlation is comparable and incremental to that of financial signals (cash-flow changes, earnings surprises, and management forecasts), which suggests that the narrative content of conference calls as extracted by readers is economically significant. Our results suggest that models of narrative content have reasonable construct validity and that this validity is likely to be improved by further thought on the unique characteristics of text.
Keywords: Textual Analysis, Machine Learning, Disclosure, Conference Calls
JEL Classification: M40, M41
Suggested Citation: Suggested Citation