Cross-Modal Hashing Retrieval with Compatible Triplet Representation
14 Pages Posted: 28 Apr 2024
Abstract
Cross-modal hashing retrieval has emerged as a promising approach due to its advantages in storage efficiency and query speed for handling diverse multimodal data. However, existing cross-modal hashing retrieval methods often oversimplify similarity by solely considering identical labels across modalities and are sensitive to noise in the original multimodal data. To tackle this challenge, we propose a cross-modal hashing retrieval approach with compatible triplet representation. In the proposed approach, we integrate the essential feature representations and semantic information from text and images into their corresponding multi-label feature representations, and introduce a fusion attention module to extract text and image modalities with channel and spatial attention features, respectively, thereby enhancing compatible triplet-based semantic information in cross-modal hashing learning. Comprehensive experiments demonstrate the superiority of the proposed approach in retrieval accuracy compared to state-of-the-art methods on three public datasets.
Keywords: Cross-modal hashing retrieval, Compatible triplet, Label network, Fusion attention
Suggested Citation: Suggested Citation