Balance Sheet Outlier Detection Using a Graph Similarity Algorithm

8 Pages Posted: 14 Oct 2011 Last revised: 20 Jul 2014

Steve Y. Yang

Stevens Institute of Technology

Randy Cogill

University of Virginia - Systems Engineering

Date Written: Feburary 5, 2013


Graph similarity measurement has been used in many applications, such as computational biology, text mining, pattern recognition, and computer vision. In this paper, we apply similarity measurement on graphs to measure structural differences in financial statements. Unconventional financial statement structures may potentially reveal deceptive intention of hiding certain information while making technically "correct" financial statements. Furthermore, unconventional financial statements may also lead to investment opportunities if legitimacy is not questioned.

We construct an algorithm based on the metric of string edit distance as an approximation of graph similarity, and apply the Levenshtein algorithm with modified string edit costs to measure string edit distance. We demonstrate the effectiveness of this algorithm in capturing the sensitive changes of balance sheet structures by applying the algorithm in two experiments. The first experiment shows the algorithm is sensitive to all three basic edits (namely deletion, insertion and substitution) on a particular balance sheet, and the second experiment shows more than 90% clustering accuracy on real balance sheets.

Keywords: Graph similarity metric, String edit distance, Hierarchical clustering, XBRL, Balance sheet, Outliers detection

Suggested Citation

Yang, Steve Y. and Cogill, Randy, Balance Sheet Outlier Detection Using a Graph Similarity Algorithm (Feburary 5, 2013). Available at SSRN: or

Steve Y. Yang (Contact Author)

Stevens Institute of Technology ( email )

Hoboken, NJ 07030
United States

Randy Cogill

University of Virginia - Systems Engineering ( email )

United States

Paper statistics

Abstract Views