From Words to Syntax: Identifying Context-specific Information in Textual Analysis
59 Pages Posted: 13 Apr 2020 Last revised: 20 Oct 2022
Date Written: April 3, 2020
When quantifying information from unstructured textual data, the traditional approach in the literature only captures semantic features of single words or phrases. The context, the sequence of words, and the relationship between words are ignored. This paper introduces a novel approach to incorporate complex syntactical features in textual analysis using two applications of machine learning (i.e., neural-network-based natural language parser and word embedding). We demonstrate the usefulness of this approach by analyzing the tone of financial narratives in earnings conference calls. We construct a new measure of sentiment that is specific to performance discussions and is adjusted for complex contextual negations. We find that this performance-specific sentiment explains cross-sectional returns and future operating performance better than the umbrella sentiment proxy and the simple rule-based measures used in the literature. An analysis of earnings-related forward-looking statements in conference calls confirms the value of this new approach in identifying context-specific information.
Keywords: textual analysis, machine learning, neural networks, artificial intelligence, natural language processing, sentiment analysis, conference calls
Suggested Citation: Suggested Citation