The Impact of Data Mining on Information Disclosure by Regulatory Agencies: With an Application to Redlining
56 Pages Posted: 20 Jun 2018 Last revised: 18 Oct 2018
Date Written: August 20, 2018
Data mining techniques can be used to locate statistical outliers that are incorrectly characterized as evidence of unlawful conduct. Using home mortgage loan data made publicly available by financial regulators, a simple data mining exercise finds that approximately three percent of all lender-MSA pairs (or approximately seven to nine percent of all lending institutions) flagged as having redlined minority neighborhoods is attributable to a failure to correct for the multiple hypothesis testing problem. The false positive rate does not fully explain, however, the estimated high frequency of statistical redlining. Three possible models of information disclosure by regulatory agencies are considered: (1) full information, (2) no information, and (3) limited information. Under a limited information model, litigation serves to correctly implement statistical hypothesis testing: a plaintiff must formulate a hypothesis prior to examination of the data and obtains the information necessary to test this hypothesis only through discovery.
Keywords: Data Mining, Multiple Hypothesis Testing, Redlining, Information Disclosure
JEL Classification: C55, K11
Suggested Citation: Suggested Citation