Joint Cnn and Vision Transformer for Indoor Scene Recognition

7 Pages Posted: 6 Mar 2025

See all articles by Chen Wang

Chen Wang

Wuhan Textile University

Xiong Pan

Wuhan Textile University

Junjie Wei

Wuhan Textile University

Multiple version iconThere are 3 versions of this paper

Abstract

Indoor scene recognition is a growing field with great potential in smart homes, robot navigation, and more. Although Convolutional Neural Networks have certain advantages in extracting low-level features and establishing local relationships, their indoor scene recognition accuracy is limited due to the lack of ability to establish long-range dependencies. Whereas Vision Transformer has the ability to establish long-range dependencies. Motivied by this, we propose a Joint Convolutional Neural Networks and Vision Transformer method (JCVT), which combines the advantages of both. First, we design a Local Enhancement Vision Transformer Module (LEVTM) to enhance the performance of Agent-CSWin Transformer by capturing rich local features. Second, to explore the semantic information contained in indoor scenes, we construct a Semantic Enhancement Convolutional Neural Networks Module (SECNNM), which employs ResNet50 (with the last classification layer removed) as an encoder and convolutional layers as a decoder. Third, to fully leverage the advantages of Convolutional Neural Networks and Vision Transformer, we integrate the LEVTM and SECNNM, as well as the original ResNet50 model to generate the final indoor scene representation. Extensive experiments on three benchmark indoor scene datasets demonstrate the superiority of the proposed method compared to state-of-the-art approaches.

Keywords: Convolutional neural networks, vision transformer, semantic information, indoor scene recognition.

Suggested Citation

Wang, Chen and Pan, Xiong and Wei, Junjie, Joint Cnn and Vision Transformer for Indoor Scene Recognition. Available at SSRN: https://ssrn.com/abstract=5168031 or http://dx.doi.org/10.2139/ssrn.5168031

Chen Wang

Wuhan Textile University ( email )

Wuhan, 430073
China

Xiong Pan

Wuhan Textile University ( email )

Wuhan, 430073
China

Junjie Wei (Contact Author)

Wuhan Textile University ( email )

Wuhan, 430073
China

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
15
Abstract Views
126
PlumX Metrics