Neighbor Patches Merging Reduces Spatial Redundancy of Nature Images

29 Pages Posted: 13 Dec 2023

See all articles by Kai Jiang

Kai Jiang

Tongji University

Peng Peng

Tongji University

Youzao Lian

Tongji University

Weihui Shao

Tongji University

weisheng xu

Tongji University

Abstract

The introduction of the Transformer architecture in Computer Vision has unified the processing of image and text data. However, Transformer networks encounter the quadratic complexity of computation with respect to the sequence length. To mitigate this challenge, the Vision Transformer (ViT) dissects images into patches, embedding them into tokens for network input and thereby reducing the sequence length. This study leverages spatial redundancy in nature images and incorporates adaptive within images. The proposed solution introduces the Neighbor Patch Merging (NEPAM) method, which merges the image patches at the network’s inception. NEPAM effectively reduces sequence length and accelerates inference without necessitating alterations to the networks. Furthermore, we observe that merging patches leads to the loss of position embeddings and accuracy/ To address this, we propose Multi-Scale Relative Position Embeddings (MS-RPE) to model the position relationship between patches with adaptive sizes. Both the NEPAM method and MS-RPE can be seamlessly integrated into the network, enabling more flexible model deployment. Experiments demonstrate that applying NEPAM and MS-RPE to Deit-Small models results in a 2.26x speedup with an accuracy loss of 2.44%, without the necessaity of retraining for a fixed pruning rate.

Keywords: Vision Transformer, Token Merging, Position Embeddings, Spatial Redundancy

Suggested Citation

Jiang, Kai and Peng, Peng and Lian, Youzao and Shao, Weihui and xu, weisheng, Neighbor Patches Merging Reduces Spatial Redundancy of Nature Images. Available at SSRN: https://ssrn.com/abstract=4663091 or http://dx.doi.org/10.2139/ssrn.4663091

Kai Jiang

Tongji University ( email )

1239 Siping Road
Shanghai, 200092
China

Peng Peng

Tongji University ( email )

1239 Siping Road
Shanghai, 200092
China

Youzao Lian

Tongji University ( email )

1239 Siping Road
Shanghai, 200092
China

Weihui Shao

Tongji University ( email )

1239 Siping Road
Shanghai, 200092
China

Weisheng Xu (Contact Author)

Tongji University ( email )

1239 Siping Road
Shanghai, 200092
China

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
47
Abstract Views
533
PlumX Metrics