
A quantitative investigation of machine psychology
Human judgments of attractiveness are shaped by our evolved psychology and socioecology, and have been well-studied since the 1960s. In contrast, large language models’ perceptions of attractiveness have entirely different origins and are comparatively understudied, with only a handful of investigations in the literature. As large language models (LLMs) become collaborators in clinical and consumer aesthetics, the question of how these models assess attractiveness and how its facial evaluations are similar or different from ours becomes pressing.
To help answer this question, our scientists are running an empirical research study comparing four frontier models against an existing dataset of human ratings of attractiveness. The findings will be published openly alongside our methodology.
Two Key Questions

Human–AI agreement
How are LLMs similar or different to humans in their perceptions of facial attractiveness, and in what ways?

AI–AI agreement
How similar or different are LLMs to one another in their perceptions of facial attractiveness, and in what ways?
Four models
Each is queried with the exact same prompt protocol for the same set of faces.
Claude
Anthropic
ChatGPT
OpenAI
Gemini
Grok
xAI
Status
Our research team

Santiago Grandas Forero
Scientist

Macken Murphy
Chief Scientist

Juan Sebastian Cely
Scientist