What Phishing E-mails Reveal: An Exploratory Analysis of Phishing Attempts Using Text Analyzes
36 Pages Posted: 2 Aug 2019
Date Written: July 26, 2019
Phishers appear particularly interested in accounting and tax, with accountants and auditing firms as frequent targets because of the proximity to organizational resources. Since phishing is typically done using emails, we use text analysis to explore differences between phishing emails and other emails. Analyzing and comparing a database of phishing messages to a database of the Enron emails, we find that the phishing data is statistically significantly different across a large number of univariate text variable categories. Further, we generate a model of phishing as “power.” Using power as the dependent variable, independent variables of friend (who they pretend to be), achievement (of their goal), (to take your) money and (typically done at) work are used as a basis to estimate power in both the phishing and non-phishing messages and finds differences on the signs of the independent variables. Finally, using the output of a text analysis, we examine the ability of neural network models to differentiate between phishing emails and Enron emails, using size-matched samples.
Keywords: Phishing, Sentiment Analysis, Python NLTK, Text Analysis, LIWC
JEL Classification: M, Y
Suggested Citation: Suggested Citation