Deep Learning in Computer Vision: Methods, Interpretation, Causation and Fairness

34 Pages Posted: 12 Jun 2019

See all articles by Nikhil Malik

Nikhil Malik

Carnegie Mellon University - David A. Tepper School of Business

Param Vir Singh

Carnegie Mellon University - David A. Tepper School of Business

Date Written: May 28, 2019

Abstract

Deep learning models have succeeded at a variety of human intelligence tasks and are already being used at commercial scale. These models largely rely on the standard gradient descent optimization of parameters W, which maps an input X to an output y ̂=f(X;W). The optimization procedure minimizes the loss (difference) between the model output y ̂ and actual output y. As an example, in the cancer detection setting, X is an MRI image, while y is the presence or absence of cancer. Three key ingredients hint at the reason behind deep learning’s power. (1) Deep architectures better adapt to breaking down complex functions into a composition of simpler abstract parts. (2) Standard gradient descent methods that attain local minima on a nonconvex Loss(y,y ̂) function that are close enough to the global minima. (3) Architectures suited for execution on parallel computing hardware (e.g., GPUs), thus making the optimization viable over hundreds of millions of observations (X,y). Computer vision tasks, where input X is a high-dimensional image or video, are particularly suited to deep learning application. Recent advances in deep architectures, i.e., inception modules, attention networks, adversarial networks and DeepRL, have opened up completely new applications that were previously unexplored. However, the breakneck progress to replace human tasks with deep learning comes with caveats. These deep models tend to evade interpretation, lack causal relationships between input X and output y and may inadvertently mimic not just human actions but human biases and stereotypes. In this tutorial, we provide an intuitive explanation of deep learning methods in computer vision as well as limitations in practice.

Keywords: Computer Vision, Deep Learning, Optimization, Generative Models, Interpretability, Causality, Fairness

JEL Classification: C45, C44, C61, C55, M31, M1

Suggested Citation

Malik, Nikhil and Singh, Param Vir, Deep Learning in Computer Vision: Methods, Interpretation, Causation and Fairness (May 28, 2019). Available at SSRN: https://ssrn.com/abstract=3395476

Nikhil Malik (Contact Author)

Carnegie Mellon University - David A. Tepper School of Business ( email )

5000 Forbes Avenue
Pittsburgh, PA 15213-3890
United States

Param Vir Singh

Carnegie Mellon University - David A. Tepper School of Business ( email )

5000 Forbes Avenue
Pittsburgh, PA 15213-3890
United States
412-268-3585 (Phone)

Register to save articles to
your library

Register

Paper statistics

Downloads
12
Abstract Views
42
PlumX Metrics
!

Under construction: SSRN citations will be offline until July when we will launch a brand new and improved citations service, check here for more details.

For more information