AI Voice in Online Video Platforms: A Multimodal Perspective on Content Creation and Consumption
56 Pages Posted: 16 Jan 2024 Last revised: 20 Nov 2024
Date Written: November 20, 2024
Abstract
Major user-generated content (UGC) platforms like TikTok have introduced AI-generated voice to assist creators in complex multimodal video creation. AI voice in videos represents a novel form of partial AI assistance, where AI augments one specific modality (audio), whereas creators maintain control over other modalities (text and visuals). This study theorizes and empirically investigates the impacts of AI voice adoption on the creation, content characteristics, and consumption of videos on a video UGC platform. Using a unique dataset of 554,252 TikTok videos, we conduct multimodal analyses to detect AI voice adoption and quantify theoretically important video characteristics in different modalities. Using a stacked difference-in-differences model with propensity score matching, we find that AI voice adoption increases creators’ video production by 21.8%. While reducing audio novelty, it enhances textual and visual novelty by freeing creators’ cognitive resources. Moreover, the heterogeneity analysis reveals that AI voice boosts engagement for less-experienced creators but reduces it for experienced creators and those with established identities. We conduct additional analyses and online randomized experiments to demonstrate two key mechanisms underlying these effects: partial AI process augmentation and partial AI content substitution. This study contributes to the UGC and human-AI collaboration literature and provides practical insights for video creators and UGC platforms.
Keywords: multimodal UGC, AI voice, partial AI assistance, video creation, video consumption, unstructured data analysis
Suggested Citation: Suggested Citation