Sparse to Dense and Back Again: Image Search on Lucene

Posted: 26 Nov 2019

Date Written: November 25, 2019

Abstract

The ImageNet challenge has produced a number of pre-trained Deep Learning models that have surpassed human performance for image classification. These models are used to generate semantically expressive vector representations from images in the form of dense vectors. Because these vectors have been shown to retain so much of the image semantics, they are good candidates for use in Image Search.

The Lucene inverted index architecture is optimized for searching against a sparse, high dimensional vector space, commonly seen in text corpora. Since its release in 1999, countless person-hours of effort have gone into making Lucene the most powerful and usable open source platform for text search in the world. It would, therefore, be desirable to leverage the Lucene platform for image search as well. However, Lucene was not designed to work well with dense, low-dimensional vectors from images.

This work explores some Approximate Nearest Neighbor (ANN) techniques by which image vectors can be projected back into a sparse, high dimensional space without significant loss of information, thus making it possible to use Lucene (and Lucene derivative platforms such as Solr or Elasticsearch) for Image Search. It also presents results of some experiments using Solr and a medium sized image corpus.

Keywords: Image Search, Vector Search, Approximate Nearest Neighbor (ANN) techniques

Suggested Citation

Pal, Sujit, Sparse to Dense and Back Again: Image Search on Lucene (November 25, 2019). Proceedings of the 3rd Annual RELX Search Summit, Available at SSRN: https://ssrn.com/abstract=3493002

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
147
PlumX Metrics