How do Twitter Conversations Differ based on Geography, Time, and Subject? A Framework and Analysis of Topical Conversations in Microblogging
2013 ASE/IEEE International Conference on Social Computing
11 Pages Posted: 12 Mar 2013 Last revised: 10 Oct 2013
Date Written: August 15, 2013
Automatic discovery of how members of social media are discussing different thoughts on particular topics would provide a unique insight into how people perceive different topics. However, identifying trending terms/words within a topical conversation is a difficult task. We take an information retrieval approach and use tf-idf (term frequency-inverse document frequency) to identify words that are more frequent in a focal conversation compared to other conversations on Twitter. This requires a query set of tweets on a particular topic (used for term frequency) and a control set of conversations to use for comparison (used for inverse document frequency). The terms identified as most important within a topical conversation are greatly affected by the particular control set used. There is no clear metric for whether one control set is better than another, since that is determined by the needs of the user, but we can investigate the stability properties of topics given different control sets. We propose a method for doing this, and show that some topics of conversation are more stable than other topics, and that this stability is also affected by whether only the most frequent terms are of interest (top-50), or if all words (full-vocabulary) are being examined. We end with a set of guidelines for how to build better topic analysis tools based on these results.
Keywords: social media, microblogging, trend identification, topic stability, language usage, ranking, Twitter
Suggested Citation: Suggested Citation