Spatiotemporal Action Detection Based on Fine-Grained
18 Pages Posted: 22 Oct 2024
Abstract
In the field of computer vision, spatiotemporal action detection is a challenging task. Traditional methods such as YOWO have achieved certain results in real-time and accuracy, but still have shortcomings in dealing with missing detail information, capturing long-distance dependencies, and target overlap problems. In response to these issues, this article proposes a spatiotemporal action detection method based on fine-grained enhancement. Firstly, a dual path fine-grained enhancement module is designed to enhance the ability to extract fine-grained features; Secondly, the self-attention&convolution module was introduced, which can better capture long-distance dependencies and cross level feature relationships; Finally, in order to address the issue of target overlap, the SIoU loss function is introduced to more comprehensively evaluate the similarity between predicted boxes and real boxes. The experimental results show that the proposed method achieves 86.4% F-mAP and 52.6% V-mAP on the UCF101-24 dataset, 20.5% F-mAP on the AVA dataset, and only 44.0 GFlops with an FPS of 34, ensuring the real-time performance of the proposed method with higher accuracy than SOTA model. In addition, the ablation experiment further demonstrated the effectiveness of each module in this paper. Overall, this method significantly improves the accuracy of spatiotemporal action detection while maintaining high efficiency, providing an effective solution for real-time action detection.
Keywords: Spatio-temporal action detection, Fine-grained, Attention mechanism, Feature extraction
Suggested Citation: Suggested Citation