How News Organizations Paraphrase Their Stories on Social Media? Computational Text Analysis Approach
32 Pages Posted: 3 Apr 2017 Last revised: 16 Aug 2017
Date Written: August 15, 2017
Social media has become one of major sources of news. As information overload prevails, news organizations need to form social media strategies to reach news readers’ limited attention (Lanham, 2006; Anderson & de Palma, 2013). This study aims to investigate one of news organizations’ potential strategies – paraphrasing a news story on a social media post.
Previous literature on the choice of news headlines found that commercial news media relying on advertising for their revenue tend to frame their news story as sensational in its headline (Reah 1998; Molek-Kozakowska, 2013). Similarly, recent studies on search engine optimization (SEO) show that news media carefully choose titles and keywords tagged in URL and HTML to maximize chances for their stories to be searched online monitoring their competitors (Dick, 2011). If these strategies are effective, news organizations are likely to adopt a similar strategy on social media. In particular, they may paraphrase their news information to make it:
(a) concise enough to fit into a text limit on a social media platform,
(b) informative enough to signal news content,
(c) and appealing to the news demand.
This strategy can influence news readers’ perception of a news topic because news consumption via social media tends to be relatively instant and less-lasting (Mitchell, Jurkowitz & Olmstead, 2014). Many news readers may learn about a news topic from a social media post rather than an original news story, as they used to do with headlines and leads of traditional news (Andrew, 2007). This implies that how news organizations paraphrase news for social media may frame news readers’ perception of a public issue.
To reveal news organizations’ paraphrasing strategy, I apply computational information retrieval and text analysis methods. Previous studies based on hand-coding approaches often analyze only social media posts (Newman, 2011) due to the large amount of data from social media posts themselves, original news articles and relationships between the two. This approach targets only information after paraphrasing, but does not allow for looking at how the paraphrase is related to the original text. Instead, I crawled news articles from 117 news organizations’ websites and their official Twitter accounts for a week (Feb 27, 2017–Mar 5, 2017), which amount to 13,773 news stories and 61,219 tweets. Also, I could identify news articles shared in each social media post matching URLs from an article and from a tweet.
I analyze how news organizations paraphrase their news articles by looking at which word in an original text is likely to make it on a social media post. This task can be carried out by discriminating words algorithms such as Logistic Lasso regression (Mitra & Gilbert, 2014) or Multinomial Inverse regression (Taddy, 2013) recently developed in statistics and machine learning fields. Unlike popular dictionary methods such as LIWC, these algorithms allow for words likely to be in a social media post to emerge as an outcome of the empirical analysis without pre-assigning psychological meanings to dictionary words.
Keywords: News, social media, paraphrasing, discriminating words algorithm
Suggested Citation: Suggested Citation