Empowering Physical Attacks with Jacobian Matrix Regularization on Vit-Based Detectors
33 Pages Posted: 10 Jan 2024
Abstract
Vision Transformers (ViTs) have achieved great success in detection tasks, but they are plagued by adversarial samples. Research on adversarial attack against ViT-based detectors is still at an early stage, as existing attack methods mainly pay attention to classification models, and the generated adversarial samples cannot take physical realizability and attack transferability into account at the same time. To overcome the limitation, we focus on transferable attacks towards ViT-based detectors and generate adversarial samples that could be realized in the physical world. Concretely, we design unique perturbation patches deployed within and beyond the target object rather than requiring the patches to be aligned with image tokens. To narrow the gap between limited digital samples and complex physical scenarios, we conduct data augmentation on training images at global and local levels. In addition, we propose a novel transferable attack method dubbed Jacobian Matrix Regularization, which consists of Feature Variance Regularization (FVR) and Attention Weight Regularization (AWR). Specifically, FVR calculates feature variances of different channels within specific layers and then sets the features as zeros for channels with top variances. Moreover, AWR is achieved by masking the largest self-attention weights. To verify the effectiveness of our method, we conduct extensive transferable experiments with typical detectors in both digital and physical space. The results indicate that our method could achieve competitive transferability compared with state-of-the-art methods.
Keywords: physical attacks, transferability, ViTs, Object Detection, Jacobian matrix regularization
Suggested Citation: Suggested Citation