An Evolutionary Network Architecture Search Framework with Adaptive Multimodal Fusion for Hand Gesture Recognition
18 Pages Posted: 12 Nov 2024
Abstract
Hand gesture recognition (HGR) based on multimodal data has attracted considerable attention due to its potential in various practical applications. Although manually designed multimodal deep networks for multimodal HGR (MHGR) have shown promising results, they often require extensive expertise and time-consuming manual adjustments. To overcome these challenges, we propose an evolutionary network architecture search framework with adaptive multimodal fusion (AMF-ENAS), which automates the design process of multimodal deep networks. Our framework incorporates a novel multimodal fusion strategy that optimizes both the positioning of fusion nodes and the fusion ratios between different branches of the network. It effectively captures the characteristics of data at shallow and deep layers while also considering the varying importance of different data streams. Furthermore, we introduce an encoding strategy for multimodal data adaptation, structuring the encoding space into three functional components: fusion points, fusion ratios, and block selection. This encoding strategy enables flexible customization of the network architecture, with the evolutionary algorithm iteratively searching for optimal configurations across different multimodal datasets. To our knowledge, this is the first application of ENAS in MHGR to directly address the challenges of determining fusion positions and ratios in multimodal data. Experimental results confirm that AMF-ENAS achieves state-of-the-art performance on the Ninapro DB2, DB3, and DB7 datasets.
Keywords: Multimodal Data, Neural Network Architecture Search, Evolutionary Algorithm, Gesture Recognition, Human-computer Interface, Deep learning.
Suggested Citation: Suggested Citation