Using Variational Multi-View Learning for Classification of Grocery Items

Klasson, Marcus; Zhang, Cheng; Kjellström, Hedvig

doi:10.2139/ssrn.3588894

Download This Paper

Open PDF in Browser

Add Paper to My Library

Using Variational Multi-View Learning for Classification of Grocery Items

Cell Press

29 Pages Posted: 19 May 2020 Publication Status: Published

See all articles by Marcus Klasson

Marcus Klasson

Royal Institute of Technology (KTH) - Department of Robotics, Perception and Learning

Hedvig Kjellström

Royal Institute of Technology (KTH) - Department of Robotics, Perception and Learning

More...

Abstract

An essential task for computer vision-based assistive technologies is to help visually impaired people to recognize objects in constrained environments, for instance, recognizing food items in a grocery store. In this paper, we introduce a novel dataset with natural images of grocery items -- fruits, vegetables and packaged products -- where all images have been taken inside grocery stores to resemble an actual shopping scenario. In addition to the natural images, we download an iconic image and a text description of each item that can be utilized for constructing better representations of the grocery items. We select a multi-view generative model called Variational Canonical Correlation Analysis (VCCA), which efficiently combines the different information of the items into a single lower-dimensional representation. In the experiments, we show that utilizing the additional information with VCCA yields higher accuracies on classifying grocery items over standard image classifiers that only uses the natural images. We observe from visualizing the latent representations that the iconic images help to construct representations that are separated by the visual differences of the items, while the text descriptions enable the model to distinguish between visually similar items by their different ingredients and flavors. Moreover, we investigate a variant of VCCA called VCCA-private that separates shared and private information of the different data views. We verify that VCCA-private can separate variations in image backgrounds and structures of text sentences from the shared representation to enable a more accurate classification of grocery items in their natural environment.

Keywords: Assistive Vision, Image Classification, Variational Autoencoders, Multi-View Learning

Suggested Citation: Suggested Citation

Klasson, Marcus and Zhang, Cheng and Kjellström, Hedvig, Using Variational Multi-View Learning for Classification of Grocery Items. Available at SSRN: https://ssrn.com/abstract=3588894 or http://dx.doi.org/10.2139/ssrn.3588894

This version of the paper has not been formally peer reviewed.