Curating 62 Practical Scenarios to Test AI Text-to-Image Models

To evaluate the 12 aspects (§3), we curate diverse and practical scenarios. Table 2 presents an overview of all the scenarios and their descriptions. Each scenario is a set of textual inputs and can be used to evaluate certain aspects. For instance, the “MS-COCO” scenario can be used to assess the alignment, quality, and efficiency aspects, and the “Inappropriate Image Prompts (I2P)” scenario [8] can be used to assess the toxicity aspect. Some scenarios may include sub-scenarios, indicating the sub-level categories or variations within them, such as “Hate” and “Violence” within I2P. We curate these scenarios by leveraging existing datasets and creating new prompts ourselves. In total, we have 62 scenarios, including the sub-scenarios.

Notably, we create new scenarios (indicated with “New” in Table 2) for aspects that were previously underexplored and lacked dedicated datasets. These aspects include originality, aesthetics, bias, and fairness. For example, to evaluate originality, we develop scenarios to test the artistic creativity of these models with textual inputs to generate landing pages, logos, and magazine covers.

This paper is available on arxiv under CC BY 4.0 DEED license.

← Previous

12 Key Aspects for Assessing the Power of Text-to-Image Models

Up Next →

Evaluating AI Models with HEIM Metrics for Fairness, Robustness, and More