An Evolutionary Network Architecture Search Framework with Adaptive Multimodal Fusion for Hand Gesture Recognition

18 Pages Posted: 12 Nov 2024

See all articles by Yizhang Xia

Yizhang Xia

Xiangtan University

Junwen Xu

Xiangtan University

Zhanglu Hou

Xiangtan University

Shihao Song

Xiangtan University

Juan Zou

Xiangtan University

Yuan Liu

Xiangtan University

Shengxiang Yang

De Montfort University

Abstract

Hand gesture recognition (HGR) based on multimodal data has attracted considerable attention due to its potential in various practical applications. Although manually designed multimodal deep networks for multimodal HGR (MHGR) have shown promising results, they often require extensive expertise and time-consuming manual adjustments. To overcome these challenges, we propose an evolutionary network architecture search framework with adaptive multimodal fusion (AMF-ENAS), which automates the design process of multimodal deep networks. Our framework incorporates a novel multimodal fusion strategy that optimizes both the positioning of fusion nodes and the fusion ratios between different branches of the network. It effectively captures the characteristics of data at shallow and deep layers while also considering the varying importance of different data streams. Furthermore, we introduce an encoding strategy for multimodal data adaptation, structuring the encoding space into three functional components: fusion points, fusion ratios, and block selection. This encoding strategy enables flexible customization of the network architecture, with the evolutionary algorithm iteratively searching for optimal configurations across different multimodal datasets. To our knowledge, this is the first application of ENAS in MHGR to directly address the challenges of determining fusion positions and ratios in multimodal data. Experimental results confirm that AMF-ENAS achieves state-of-the-art performance on the Ninapro DB2, DB3, and DB7 datasets.

Keywords: Multimodal Data, Neural Network Architecture Search, Evolutionary Algorithm, Gesture Recognition, Human-computer Interface, Deep learning.

Suggested Citation

Xia, Yizhang and Xu, Junwen and Hou, Zhanglu and Song, Shihao and Zou, Juan and Liu, Yuan and Yang, Shengxiang, An Evolutionary Network Architecture Search Framework with Adaptive Multimodal Fusion for Hand Gesture Recognition. Available at SSRN: https://ssrn.com/abstract=5018342 or http://dx.doi.org/10.2139/ssrn.5018342

Yizhang Xia

Xiangtan University ( email )

International Exchange Center
Hunan, 411105
China

Junwen Xu

Xiangtan University ( email )

International Exchange Center
Hunan, 411105
China

Zhanglu Hou (Contact Author)

Xiangtan University ( email )

International Exchange Center
Hunan, 411105
China

Shihao Song

Xiangtan University ( email )

International Exchange Center
Hunan, 411105
China

Juan Zou

Xiangtan University ( email )

Yuan Liu

Xiangtan University ( email )

Shengxiang Yang

De Montfort University ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
11
Abstract Views
148
PlumX Metrics