Spatiotemporal Action Detection Based on Fine-Grained

Cheng, Yan; mike, jack; Fang, Chengxing; Huang, Jiawen

doi:10.2139/ssrn.4995638

Download This Paper

Open PDF in Browser

Add Paper to My Library

Spatiotemporal Action Detection Based on Fine-Grained

18 Pages Posted: 22 Oct 2024

See all articles by Yan Cheng

Chengxing Fang

Jiangsu University of Science and Technology

In the field of computer vision, spatiotemporal action detection is a challenging task. Traditional methods such as YOWO have achieved certain results in real-time and accuracy, but still have shortcomings in dealing with missing detail information, capturing long-distance dependencies, and target overlap problems. In response to these issues, this article proposes a spatiotemporal action detection method based on fine-grained enhancement. Firstly, a dual path fine-grained enhancement module is designed to enhance the ability to extract fine-grained features; Secondly, the self-attention&convolution module was introduced, which can better capture long-distance dependencies and cross level feature relationships; Finally, in order to address the issue of target overlap, the SIoU loss function is introduced to more comprehensively evaluate the similarity between predicted boxes and real boxes. The experimental results show that the proposed method achieves 86.4% F-mAP and 52.6% V-mAP on the UCF101-24 dataset, 20.5% F-mAP on the AVA dataset, and only 44.0 GFlops with an FPS of 34, ensuring the real-time performance of the proposed method with higher accuracy than SOTA model. In addition, the ablation experiment further demonstrated the effectiveness of each module in this paper. Overall, this method significantly improves the accuracy of spatiotemporal action detection while maintaining high efficiency, providing an effective solution for real-time action detection.

Keywords: Spatio-temporal action detection, Fine-grained, Attention mechanism, Feature extraction

Suggested Citation: Suggested Citation

Cheng, Yan and mike, jack and Fang, Chengxing and Huang, Jiawen, Spatiotemporal Action Detection Based on Fine-Grained. Available at SSRN: https://ssrn.com/abstract=4995638 or http://dx.doi.org/10.2139/ssrn.4995638