Cross-Domain Multi-Style Merge for Image Captioning

8 Pages Posted: 14 Jul 2022

See all articles by Yiqun Duan

Yiqun Duan

affiliation not provided to SSRN

Zhen Wang

The University of Sydney

Li Yi

The Hong Kong University of Science and Technology

Jingya Wang

ShanghaiTech University

Abstract

Multi-style image captioning has attracted wide attention recently. Existing approaches mainly rely on style synthetics within a single domain. They cannot deal with multiple styles combination since various styles naturally cannot be included in a uniform dataset.This paper is the first one to investigate the cross-domain multi-style merge for image captioning. Specifically, we propose a novel image caption model with a multi-style gated transformer block to fit the cross-domain caption generation task. Conventional generative adversarial learning for language methods may suffer from the distribution distortion problem, since real datasets do not contain captions with style combinations. Therefore, we devise a multi-stage self-learning framework for the proposed image caption model to exploit real corpus with pseudo styles gradually. Comprehensive experiments and ablation studies demonstrate the effectiveness of our proposed method on the multi-style merge for image captioning.

Keywords: Computer Vision, Vision & Language, Image Captioning, Multi-Style Caption Generation

Suggested Citation

Duan, Yiqun and Wang, Zhen and Yi, Li and Wang, Jingya, Cross-Domain Multi-Style Merge for Image Captioning. Available at SSRN: https://ssrn.com/abstract=4162675 or http://dx.doi.org/10.2139/ssrn.4162675

Yiqun Duan

affiliation not provided to SSRN ( email )

No Address Available

Zhen Wang

The University of Sydney ( email )

University of Sydney
Sydney, 2006
Australia

Li Yi

The Hong Kong University of Science and Technology ( email )

Jingya Wang (Contact Author)

ShanghaiTech University ( email )

393 Middle Huaxia Road, Pudong
Shanghai, 201210
China

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
31
Abstract Views
264
PlumX Metrics