Application of Multimodal Generation Model in Short Video Content Personalized Generation

Yang, Minghui

doi:10.2139/ssrn.5166910

Download This Paper

Open PDF in Browser

Add Paper to My Library

Application of Multimodal Generation Model in Short Video Content Personalized Generation

15 Pages Posted: 5 Mar 2025

See all articles by Minghui Yang

Minghui Yang

affiliation not provided to SSRN

Abstract

The rise of short video platforms has led to a higher demand for rapidly generated personalized content. Existing systems either struggle with high levels of customization or require large amounts of data, limiting real-time production. A multimodal generation model serves as the focus of study to generate customized short video content which adapts to user preferences as well as their behavioral patterns. The objective targets an integrative model using text alongside image and audio data to make context-specific short video content, which delivers personalized entertainment. First, it analyses user preferences from interaction data and then synthesizes corresponding video content using a novel method called a stochastic paint optimizer with an intelligent convolutional neural network (SPO-IntelliConvNet). To develop personalized content, user preference data is collected, which includes interactions such as video views and comments. The model employs natural language processing (NLP), audio processing and computer vision to merge text, image, and audio modalities. Pre-processing includes tokenization for text, Canny edge detection for images, and Wiener filtering for audio, optimizing each modality for better analysis and feature extraction using principal component analysis (PCA) to reduce the dimensions of features from all three modalities to lower dimensions while preserving essential information. This proposed approach achieved superior personalized content development, leading to increased user satisfaction and engagement. The outcome was measured using BLEU, ROUGE-L, METEOR, and CIDEr metrics. The system's ability to successfully incorporate multimodal data resulted in more precise video customization, as demonstrated by interaction metrics and user comments. This multimodal generation model provides an advanced solution for creating personalized short video content, increasing the user experience with highly tailored content.

Keywords: Personalized content, multimodal generation, stochastic paint optimizer with intelligent convolutional neural network (SPO-IntelliConvNet), modalities.

Suggested Citation: Suggested Citation

Yang, Minghui, Application of Multimodal Generation Model in Short Video Content Personalized Generation. Available at SSRN: https://ssrn.com/abstract=5166910 or http://dx.doi.org/10.2139/ssrn.5166910