Application of Multimodal Generation Model in Short Video Content Personalized Generation
15 Pages Posted: 5 Mar 2025
Abstract
The rise of short video platforms has led to a higher demand for rapidly generated personalized content. Existing systems either struggle with high levels of customization or require large amounts of data, limiting real-time production. A multimodal generation model serves as the focus of study to generate customized short video content which adapts to user preferences as well as their behavioral patterns. The objective targets an integrative model using text alongside image and audio data to make context-specific short video content, which delivers personalized entertainment. First, it analyses user preferences from interaction data and then synthesizes corresponding video content using a novel method called a stochastic paint optimizer with an intelligent convolutional neural network (SPO-IntelliConvNet). To develop personalized content, user preference data is collected, which includes interactions such as video views and comments. The model employs natural language processing (NLP), audio processing and computer vision to merge text, image, and audio modalities. Pre-processing includes tokenization for text, Canny edge detection for images, and Wiener filtering for audio, optimizing each modality for better analysis and feature extraction using principal component analysis (PCA) to reduce the dimensions of features from all three modalities to lower dimensions while preserving essential information. This proposed approach achieved superior personalized content development, leading to increased user satisfaction and engagement. The outcome was measured using BLEU, ROUGE-L, METEOR, and CIDEr metrics. The system's ability to successfully incorporate multimodal data resulted in more precise video customization, as demonstrated by interaction metrics and user comments. This multimodal generation model provides an advanced solution for creating personalized short video content, increasing the user experience with highly tailored content.
Keywords: Personalized content, multimodal generation, stochastic paint optimizer with intelligent convolutional neural network (SPO-IntelliConvNet), modalities.
Suggested Citation: Suggested Citation