AI Voice in Online Video Platforms: A Multimodal Perspective on Content Creation and Consumption

56 Pages Posted: 16 Jan 2024 Last revised: 20 Nov 2024

See all articles by Xiaoke Zhang

Xiaoke Zhang

University of British Columbia (UBC) - Sauder School of Business

Mi Zhou

University of British Columbia (UBC) - Sauder School of Business

Gene Moo Lee

University of British Columbia (UBC) - Sauder School of Business

Date Written: November 20, 2024

Abstract

Major user-generated content (UGC) platforms like TikTok have introduced AI-generated voice to assist creators in complex multimodal video creation. AI voice in videos represents a novel form of partial AI assistance, where AI augments one specific modality (audio), whereas creators maintain control over other modalities (text and visuals). This study theorizes and empirically investigates the impacts of AI voice adoption on the creation, content characteristics, and consumption of videos on a video UGC platform. Using a unique dataset of 554,252 TikTok videos, we conduct multimodal analyses to detect AI voice adoption and quantify theoretically important video characteristics in different modalities. Using a stacked difference-in-differences model with propensity score matching, we find that AI voice adoption increases creators’ video production by 21.8%. While reducing audio novelty, it enhances textual and visual novelty by freeing creators’ cognitive resources. Moreover, the heterogeneity analysis reveals that AI voice boosts engagement for less-experienced creators but reduces it for experienced creators and those with established identities. We conduct additional analyses and online randomized experiments to demonstrate two key mechanisms underlying these effects: partial AI process augmentation and partial AI content substitution. This study contributes to the UGC and human-AI collaboration literature and provides practical insights for video creators and UGC platforms.

Keywords: multimodal UGC, AI voice, partial AI assistance, video creation, video consumption, unstructured data analysis

Suggested Citation

Zhang, Xiaoke and Zhou, Mi and Lee, Gene Moo, AI Voice in Online Video Platforms: A Multimodal Perspective on Content Creation and Consumption (November 20, 2024). Available at SSRN: https://ssrn.com/abstract=4676705 or http://dx.doi.org/10.2139/ssrn.4676705

Xiaoke Zhang

University of British Columbia (UBC) - Sauder School of Business ( email )

2053 Main Mall
Vancouver, BC V6T 1Z2
Canada

Mi Zhou (Contact Author)

University of British Columbia (UBC) - Sauder School of Business ( email )

2053 Main Mall
Vancouver, BC V6T 1Z2
Canada

HOME PAGE: http://sites.google.com/view/mizhou

Gene Moo Lee

University of British Columbia (UBC) - Sauder School of Business ( email )

2053 Main Mall
Vancouver, BC V6T 1Z2
Canada

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
813
Abstract Views
4,063
Rank
65,724
PlumX Metrics