Scope 3 Emissions: Data Quality and Machine Learning Prediction Accuracy
46 Pages Posted: 17 Aug 2022 Last revised: 18 Aug 2022
Date Written: August 16, 2022
This paper explores the quality of Scope 3 emission data in terms of divergence and composition and the performance of machine learning models in predicting Scope 3 emissions. We do so using the Scope 3 emission datasets of three of the largest data providers (Refinitiv Eikon, and ISS). We find considerable divergence between third party providers, making it difficult for investors to know their ‘ exposure to Scope 3 emissions. Surprisingly, divergence exists between the datasets for emissions values that have been reported by firms (identical data points between Bloomberg and Refinitiv Eikon). The divergence is even larger for ISS when it adjusts reported values using its proprietary models ( identical data points). With respect to the composition of Scope 3 emissions, firms generally report incomplete compositions, yet they are reporting more categories over time. There is a persistent contrast between relevance and completeness in the composition of Scope 3 emissions across sectors, as irrelevant categories such as travel emissions are reported more frequently than relevant ones, such as the use of products and processing of sold products We also find that the application of machine learning algorithms can improve the prediction accuracy of the aggregated Scope 3 emissions (up to 6%) and its components, especially when each category is estimated individually and aggregated into the total Scope 3 emissions values (up to 25%). It is easier to predict upstream emissions than downstream e missions. Prediction performance is primarily limited by low observations in particular categories, and predictor importance varies by category. We conclude that users of the Scope 3 emission datasets should consider data source, quality and prediction errors when using data from third party providers in their risk analyses.
Keywords: Scope 3 emissions, Carbon footprint, Climate finance, Machine learning, transition risk, Errors in variables
JEL Classification: C89, G17, Q51, Q54
Suggested Citation: Suggested Citation