Rd-Fgm: A Novel Model for High-Quality and Diverse Food Image Generation And Ingredient Classification
14 Pages Posted: 17 Apr 2024
Abstract
Food image generation plays a crucial role in evaluating multiple food ingredients, predicting dietary preferences, recommending food, and computing dietary nutrition. However, this task is challenging due to the large variation in the appearance of recipe components, the difficulty in aligning multi-modal features, and the lack of diversity in generated data. To address these challenges, we propose a novel RecipeCLIP-Diffusion Food Generation Model (RD-FGM), which facilitates high-quality diverse image generation while accomplishing multi-modal feature alignment. Specifically, the RecipeCLIP model implements a multi-ingredient embedding of image-text pairs for aligning contextual features. Additionally, we devise a multi-conditional guided diffusion model that achieves data distribution learning and generation control. We evaluate RD-FGM on both the large-scale Recipe1M dataset and the VIREO Food-172 Chinese dataset, and our results demonstrate the effectiveness and versatility of RD-FGM. Furthermore, we conducted experiments to assess its effectiveness in ingredient classification using the VIREO Food-172 and ETH Food-101 datasets. The designed multi-ingredient embedding utilized in RD-FGM alignment of contextual features, improving ingredient classification performance compared to baselines. The capability to generate realistic food images from textual recipes opens up new avenues for exploring culinary creations, food and ingredients classification, promising various applications in the food industry and beyond.
Keywords: Food computing, Food image generation, Diffusion model, Multi-modal joint embedding, Computer vision
Suggested Citation: Suggested Citation