A Novel Multimodal Hand Gesture Recognition Model Using Combined Approach of Inter-Fram Motion and Shared Attention Weights

19 Pages Posted: 27 Dec 2024

See all articles by Xiaorui Zhang

Xiaorui Zhang

Nanjing University of Information Science and Technology

Peisen Lu

Nanjing University of Information Science and Technology

Xianglong Zeng

Nanjing University of Aeronautics and Astronautics

Wei Sun

Nanjing University of Information Science and Technology

Abstract

Dynamic hand gesture recognition based on computer vision aims at enabling computers to understand the semantic meaning conveyed by hand gestures in videos. Existing methods predominately rely on spatiotemporal attention mechanisms to extract hand motion features in a large spatiotemporal scope. However, they cannot accurately focus on the moving hand region for hand feature extraction because frame sequences contain a substantial amount of redundant information. Although multimodal techniques can extract a wider variety of hand feature, they are less successful at utilizing information interactions between various modalities for accurate feature extraction. To address these challenges, this study proposes a multimodal hand gesture recognition model combining inter-frame motion and shared attention weights. By jointly using an inter-frame motion attention mechanism and adaptive down-sampling, the spatiotemporal search scope can be effectively narrowed down to the hand-related regions based on the characteristic of hands exhibiting obvious movements. The proposed inter-modal attention weights loss, meanwhile, allows the depth modality and the RGB modality to share the attention weights so that each modality can use the attention weights of other modalities to adjust its own attention weights. Experimental results on the EgoGesture, NVGesture, and Jester datasets demonstrate the superiority of our proposed model over existing state-of-the-art methods in terms of hand motion feature extraction and hand gesture recognition accuracy.

Keywords: Hand gesture recognition, attention mechanisms, spatiotemporal scope, multimodal techniques

Suggested Citation

Zhang, Xiaorui and Lu, Peisen and Zeng, Xianglong and Sun, Wei, A Novel Multimodal Hand Gesture Recognition Model Using Combined Approach of Inter-Fram Motion and Shared Attention Weights. Available at SSRN: https://ssrn.com/abstract=5073623 or http://dx.doi.org/10.2139/ssrn.5073623

Xiaorui Zhang (Contact Author)

Nanjing University of Information Science and Technology ( email )

Nanjing
China

Peisen Lu

Nanjing University of Information Science and Technology ( email )

Nanjing
China

Xianglong Zeng

Nanjing University of Aeronautics and Astronautics ( email )

Yudao Street
210016
Nanjing,, 210016
China

Wei Sun

Nanjing University of Information Science and Technology ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
15
Abstract Views
149
PlumX Metrics