Developing a Data Science Approach to Detecting Income Fraud for the Peer to Peer Loan Industry
13 Pages Posted: 9 Sep 2016
Date Written: September 7, 2016
Personal loans can be obtained by borrowers from very different types of lending institution. The most common are a traditional loan institutions, payday lenders, or a Peer to Peer (P2P) lending brokers. P2P lending companies do not loan the money directly. They link the borrower to a lender and provide the lender with the borrower’s income which is not usually verified. The P2P lenders collects fees based on the transaction and financially benefits from a higher number of introductions of borrowers and lenders. P2P lending is becoming more popular among borrowers because the pay highest interest rates are much lower than payday lenders and loans require less verification of income and assets than traditional loan institutions. A higher reported income with P2P lenders can result in a larger loan for the borrower and thus more profit and fees generated for the P2P lender. If the loan defaults due to an overstated or fraudulently reported income by the borrow, the P2P lender does not suffer, it is the lender that was matched to the borrower by the P2P lender that will incur the financial loss. This paper focuses on proposing a data science approach to detecting loan applicants that provide fraudulent income data to P2P lenders. The data obtained for this study contained 887,379 observations and 74 variables of loan applicants from the P2P loan company, Lending Club. The initial observations of this data showed that unverified loans make up 23% of the defaulted loans while verified and source verified loans made up about 77%. Described within this paper is how the data set for analysis was obtained and prepared for analysis, the initial findings, the proposed data science approach to fully analyzing this data, and the significance of the lending industry, both traditional and P2P, with a method of detecting fraudulent income reported on loan applications. Models generated from this analysis could be incorporated by lenders into their applications along with research in this area should improve the P2P lending industry by increasing the detection of fraudulent income reported on loan applications.
Keywords: Detect, Financial, Fraud Detection, Housing Crisis, Kaggle.com, Lending, Linearted
JEL Classification: G2, G20, G21, G23, G24, G28
Suggested Citation: Suggested Citation