Investigating how the generative AI model Sora depicts human attractiveness and whether it reflects societal stereotypes.
This repository contains the code, analyses, and report for the course 02445: Project in Statistical Evaluation of Artificial Intelligence and Data at DTU (June 2025).
Authors:
- Valdemar Stamm Kristensen (s244742)
- Frederik Lysholm Jønsson (s245362)
- William Hoffmann Hyldig (s245176)
- Gustav Christensen (s246089)
Study line: Artificial Intelligence and Data
We explored potential biases in Sora’s AI-generated images of men and women, focusing on how the model interprets attractiveness based on prompt wording.
By generating a balanced dataset with prompts such as “attractive man/woman”, “unattractive man/woman”, and “man/woman”, we analyzed whether skin tone, hair color, age, and other visual traits systematically varied across groups.
- Some labels (e.g., age, hair length, hair health) involved subjective judgments and manual annotation.
- Faces in the same 3×3 grid may not be fully independent samples.
- We conducted many statistical tests (risk of false positives) without multiple-testing correction.
- While sample size was estimated with ANOVA, the actual tests used were non-parametric due to lack of normality.
- Dataset generation: 972 portraits from Sora via balanced prompt design.
- Feature extraction: Skin/hair luminance from RGB values, categorical labels (glasses, beard, hijab, hairstyle, etc.).
- Preprocessing: Cropping, luminance calculation, manual + GPT-4.1 annotations for age.
- Statistical analysis:
- Shapiro–Wilk & Levene → normality/variance tests (rejected).
- Kruskal–Wallis and Mann–Whitney U-tests for skin/hair luminance.
- Chi-squared and Fisher’s exact tests for categorical attributes.
-
Skin & Hair Biases:
- Attractive women → lighter skin.
- Attractive men → darker hair, medium-length styles.
- Unattractive categories → older age groups, lighter hair, absence of minority representation.
-
Categorical Biases:
- Glasses nearly absent in “attractive” groups but common in “unattractive”.
- Beards most frequent in attractive men, least in unattractive men.
- Hijabs underrepresented in “attractive” and entirely absent in “unattractive”.
-
Overall:
Sora consistently associates attractiveness with youth, lighter female skin, darker male hair, and the absence of accessories or cultural markers.
-
Final/
- CheckDataTypes.ipynb → Data type checks.
- DataCleaning.ipynb → Preprocessing and handling missing/subjective data.
- Extract-RGB-Script.ipynb → RGB extraction for skin/hair.
- RGB-to-Luminans.ipynb → Luminance calculations.
- SampleSize.ipynb → ANOVA-based sample size estimation.
- StatisticAnalysis.ipynb → Statistical tests and results.
- Plots/ → Visualizations (violin plots, mosaic plots, correlation plots).
- Csv-files/ → Processed datasets.
- Other/ → Supporting scripts and annotations.
-
README.md → Project overview.
- Generative models like Sora risk reproducing and amplifying beauty stereotypes.
- Future research could:
- Treat image grids as dependent units.
- Apply corrections for multiple testing.
- Compare results across other generative models.
- Explore bias mitigation strategies (e.g., prompt engineering, diverse training data).
- Introduction to Machine Learning and Data Mining (Herlau, Schmidt & Mørup, DTU, 2023)
- Introduction to Statistics at DTU (Brockhoff et al., 2024)
- DTU 02445 course slides on model evaluation and bias
- scikit-learn documentation
- OpenAI documentation on GPT-4.1 and Sora