Information Value of Property Description: A Machine Learning Approach
41 Pages Posted: 20 Oct 2020 Last revised: 10 Nov 2020
Date Written: September 1, 2020
This paper employs machine learning to quantify the value of "soft" information contained in real estate property descriptions. Textual descriptions contain information that traditional hedonic attributes cannot capture. A one standard deviation increase in the uniqueness of a property based on this "soft" information leads to a 15% increase in property sale price in a hedonic price model and a 10% increase in a repeat sales price model. The effects in the hedonic model appear to arise through two channels: the unobserved quality of the housing unit, and the market power of the housing unit relative to competing properties. The effects in the repeat sales model appear to be driven entirely by the market power of the unit. Further, an annual hedonic price index ignoring our measure of unobserved quality overstates real estate prices by between 10% to 23% and mistimes the stabilization of housing prices following the Great Recession. Similar, but smaller effects, are observed for the repeat sales price index.
Keywords: Natural Language Processing, Unsupervised Machine Learning, Soft Information, Housing Prices, Price indexes, Property Descriptions
JEL Classification: R31, G12, G14, C45
Suggested Citation: Suggested Citation