Deep Learning in Computer Vision: Methods, Interpretation, Causation and Fairness

32 Pages Posted: 12 Jun 2019 Last revised: 9 Oct 2019

See all articles by Nikhil Malik

Nikhil Malik

Marshall School of Business, USC

Param Vir Singh

Carnegie Mellon University - David A. Tepper School of Business

Date Written: May 28, 2019

Abstract

Deep learning models have succeeded at a variety of human intelligence tasks and are already being used at commercial scale. These models largely rely on the standard gradient descent optimization of parameters W, which maps an input X to an output y ̂=f(X;W). The optimization procedure minimizes the loss (difference) between the model output y ̂ and actual output y. As an example, in the cancer detection setting, X is an MRI image, while y is the presence or absence of cancer. Three key ingredients hint at the reason behind deep learning’s power. (1) Deep architectures better adapt to breaking down complex functions into a composition of simpler abstract parts. (2) Standard gradient descent methods that attain local minima on a nonconvex Loss(y,y ̂) function that are close enough to the global minima. (3) Architectures suited for execution on parallel computing hardware (e.g., GPUs), thus making the optimization viable over hundreds of millions of observations (X,y). Computer vision tasks, where input X is a high-dimensional image or video, are particularly suited to deep learning application. Recent advances in deep architectures, i.e., inception modules, attention networks, adversarial networks and DeepRL, have opened up completely new applications that were previously unexplored. However, the breakneck progress to replace human tasks with deep learning comes with caveats. These deep models tend to evade interpretation, lack causal relationships between input X and output y and may inadvertently mimic not just human actions but human biases and stereotypes. In this tutorial, we provide an intuitive explanation of deep learning methods in computer vision as well as limitations in practice.

Keywords: Computer Vision, Deep Learning, Optimization, Generative Models, Interpretability, Causality, Fairness

JEL Classification: C45, C44, C61, C55, M31, M1

Suggested Citation

Malik, Nikhil and Singh, Param Vir, Deep Learning in Computer Vision: Methods, Interpretation, Causation and Fairness (May 28, 2019). Available at SSRN: https://ssrn.com/abstract=3395476 or http://dx.doi.org/10.2139/ssrn.3395476

Nikhil Malik (Contact Author)

Marshall School of Business, USC ( email )

701 Exposition Blvd
Los Angeles, CA California 90089
United States

Param Vir Singh

Carnegie Mellon University - David A. Tepper School of Business ( email )

5000 Forbes Avenue
Pittsburgh, PA 15213-3890
United States
412-268-3585 (Phone)

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
325
Abstract Views
1,432
Rank
171,313
PlumX Metrics