Explaining Interpretable Machine Learning: Theory, Methods and Applications

Benk, Michaela; Ferrario, Andrea

doi:10.2139/ssrn.3748268

Download This Paper

Open PDF in Browser

Add Paper to My Library

Explaining Interpretable Machine Learning: Theory, Methods and Applications

87 Pages Posted: 21 Jan 2021

See all articles by Michaela Benk

Michaela Benk

ETH Zürich - Department of Management, Technology, and Economics (D-MTEC); ETH Zürich - Mobiliar Lab for Analytics

Andrea Ferrario

University of Zurich; ETH Zürich

Date Written: December 11, 2020

Abstract

This working paper aims at providing a structured and accessible introduction to the topic of interpretable machine learning. We start with an overview of the research literature and we continue by analyzing selected methods to explain machine learning model outcomes. We apply these methods in two distinct case studies. The theory on machine learning interpretability is discussed together with the concepts of explanation, interpretation, and trust from philosophy and social sciences. We choose counterfactual explanations and Locally Interpretable Model-agnostic Explanations (LIME) as prominent examples of machine learning interpretability methods and we discuss their Python implementations in detail. We apply the chosen methods in two separate case studies; the first uses the Boston Housing dataset to classify census tracts in the Boston metropolitan area. The second case study is focused on the natural language processing and classification of Youtube comments. The results of the first case study show that the existing Python implementation of counterfactual explanations does not allow for controlling sparsity and feasibility of the explanations. Moreover, it does not properly handle datasets with categorical variables. The results of the second case study show that the understandability of LIME explanations depends, among others, on the structure of the text instance to be explained. Therefore, practitioners have to rely on domain knowledge to identify and share only the informative explanations. The above limitations have to be taken care of to ensure applicability of counterfactual explanations and LIME in real-world applications.

Keywords: interpretable machine learning, explanation, interpretation, trust, trust in human-machine interactions, counterfactual explanations, Local Interpretable Model-agnostic Explanations (LIME), Python, Tensorflow 2.0

JEL Classification: C45, C51, C52, G22

Suggested Citation: Suggested Citation

Benk, Michaela and Ferrario, Andrea, Explaining Interpretable Machine Learning: Theory, Methods and Applications (December 11, 2020). Available at SSRN: https://ssrn.com/abstract=3748268 or http://dx.doi.org/10.2139/ssrn.3748268