Exploring the Potential of Novel Image-to-Text Generators as Prompt Engineers for CivitAI Models

Sophia Song, Joy Song, Junha Lee, Younah Kang, Hoyeon Moon

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Individuals looking to utilize AI image generation technologies to aid in creative content generation, such as anime or webtoon artists, may struggle to use new state-of-the-art text-to-image generators due to the lack of familiarity of prompt engineering. This paper explores the possibility of using models that have inherent image-to-text capacities like CLIP (ViT-L), CLIP (ViT-H), DeepDanbooru, GPT-4, and Gemini 1.5 pro, to automatically generate prompts that produce high-quality single-character images, which will help to streamline the creative content process for character image production during the character ideation phase. We employed image evaluation metrics like CLIP image-to-image (CLIPI-I), CLIP text-to-image (CLIPT-I), Contrastive Character Image Pretraining (CCIP), Bilingual Evaluation Understudy Score (BLEU), and ImageReward to compute quantitative measures to compare images representing CivitAI models to images produced by prompts that were automatically generated by different image-to-text generators. We found that the image-to-text generators' CLIPI-I scores were not statistically significant from one another, which means that the images were visually similar to each other. However, from the Bleu scores we found that the textual prompts were dissimilar between image-to-text generators. This means that visually similar images can be generated by different, but semantically similar tokens. We also found that most of the existing image evaluation metrics are not satisfactory to reflect the perceived preference of humans in their subjective ratings for images.

Original languageEnglish
Title of host publicationProceedings - 2024 16th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages626-631
Number of pages6
ISBN (Electronic)9798350377903
DOIs
Publication statusPublished - 2024
Event16th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2024 - Takamatsu, Japan
Duration: 2024 Jul 62024 Jul 12

Publication series

NameProceedings - 2024 16th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2024

Conference

Conference16th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2024
Country/TerritoryJapan
CityTakamatsu
Period24/7/624/7/12

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Exploring the Potential of Novel Image-to-Text Generators as Prompt Engineers for CivitAI Models'. Together they form a unique fingerprint.

Cite this