Neighbor Patches Merging Reduces Spatial Redundancy of Nature Images

Jiang, Kai; Peng, Peng; Lian, Youzao; Shao, Weihui; xu, weisheng

doi:10.2139/ssrn.4663091

Download This Paper

Open PDF in Browser

Add Paper to My Library

Neighbor Patches Merging Reduces Spatial Redundancy of Nature Images

29 Pages Posted: 13 Dec 2023

See all articles by Kai Jiang

The introduction of the Transformer architecture in Computer Vision has unified the processing of image and text data. However, Transformer networks encounter the quadratic complexity of computation with respect to the sequence length. To mitigate this challenge, the Vision Transformer (ViT) dissects images into patches, embedding them into tokens for network input and thereby reducing the sequence length. This study leverages spatial redundancy in nature images and incorporates adaptive within images. The proposed solution introduces the Neighbor Patch Merging (NEPAM) method, which merges the image patches at the network’s inception. NEPAM effectively reduces sequence length and accelerates inference without necessitating alterations to the networks. Furthermore, we observe that merging patches leads to the loss of position embeddings and accuracy/ To address this, we propose Multi-Scale Relative Position Embeddings (MS-RPE) to model the position relationship between patches with adaptive sizes. Both the NEPAM method and MS-RPE can be seamlessly integrated into the network, enabling more flexible model deployment. Experiments demonstrate that applying NEPAM and MS-RPE to Deit-Small models results in a 2.26x speedup with an accuracy loss of 2.44%, without the necessaity of retraining for a fixed pruning rate.

Keywords: Vision Transformer, Token Merging, Position Embeddings, Spatial Redundancy

Suggested Citation: Suggested Citation

Jiang, Kai and Peng, Peng and Lian, Youzao and Shao, Weihui and xu, weisheng, Neighbor Patches Merging Reduces Spatial Redundancy of Nature Images. Available at SSRN: https://ssrn.com/abstract=4663091 or http://dx.doi.org/10.2139/ssrn.4663091