Paving the Way for Better AI Models: Insights from HEIM’s 12-Aspect Benchmark

We introduced Holistic Evaluation of Text-to-Image Models (HEIM), a new benchmark to assess 12 important aspects in text-to-image generation, including alignment, quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency. Our evaluation of 26 recent text-to-image models reveals that different models excel in different aspects, opening up research avenues to study whether and how to develop models that excel across multiple aspects. To enhance transparency and reproducibility, we release our evaluation pipeline, along with the generated images and human evaluation results. We encourage the community to consider the different aspects when developing text-to-image models.

This paper is available on arxiv under CC BY 4.0 DEED license.

← Previous

A Comprehensive Evaluation of 26 State-of-the-Art Text-to-Image Models

Up Next →

New Dimensions in Text-to-Image Model Evaluation