Video Alignment Using Unsupervised Learning of Local and Global Features
18 Pages Posted: 25 Mar 2025
There are 2 versions of this paper
Video Alignment Using Unsupervised Learning of Local and Global Features
Abstract
This paper presents an unsupervised method for video alignment, addressing the challenge of matching frames from videos depicting similar actions despite execution and appearance variations.Our approach extracts global and local features from video frames using person detection, pose estimation, and a VGG network. These features are combined into multidimensional time series representing each video, which are aligned using a novel Diagonalized Dynamic Time Warping (DDTW) algorithm.A key advantage is that our method does not require training, enabling adaptation to new actions without specific training data. Additionally, it enables action phase labeling with only a few labeled examples.We evaluate the method on video synchronization and phase classification tasks using the Penn Action and a subset of the UCF101 datasets. A new metric, Enclosed Area Error (EAE), is proposed for more effective synchronization evaluation. Results show that our method outperforms existing stateof-the-art approaches, including TCC, and other self-supervised and weakly supervised methods.
Keywords: Video Alignment, Video Synchronization, Unsupervised Learning, Phase classification, Dynamic Time Warping(DTW)
Suggested Citation: Suggested Citation