Deep Learning in Computer Vision: Methods, Interpretation, Causation and Fairness
32 Pages Posted: 12 Jun 2019 Last revised: 9 Oct 2019
Date Written: May 28, 2019
Abstract
Deep learning models have succeeded at a variety of human intelligence tasks and are already being used at commercial scale. These models largely rely on the standard gradient descent optimization of parameters W, which maps an input X to an output y ̂=f(X;W). The optimization procedure minimizes the loss (difference) between the model output y ̂ and actual output y. As an example, in the cancer detection setting, X is an MRI image, while y is the presence or absence of cancer. Three key ingredients hint at the reason behind deep learning’s power. (1) Deep architectures better adapt to breaking down complex functions into a composition of simpler abstract parts. (2) Standard gradient descent methods that attain local minima on a nonconvex Loss(y,y ̂) function that are close enough to the global minima. (3) Architectures suited for execution on parallel computing hardware (e.g., GPUs), thus making the optimization viable over hundreds of millions of observations (X,y). Computer vision tasks, where input X is a high-dimensional image or video, are particularly suited to deep learning application. Recent advances in deep architectures, i.e., inception modules, attention networks, adversarial networks and DeepRL, have opened up completely new applications that were previously unexplored. However, the breakneck progress to replace human tasks with deep learning comes with caveats. These deep models tend to evade interpretation, lack causal relationships between input X and output y and may inadvertently mimic not just human actions but human biases and stereotypes. In this tutorial, we provide an intuitive explanation of deep learning methods in computer vision as well as limitations in practice.
Keywords: Computer Vision, Deep Learning, Optimization, Generative Models, Interpretability, Causality, Fairness
JEL Classification: C45, C44, C61, C55, M31, M1
Suggested Citation: Suggested Citation