ACTIVE STUDY 012026 / Spring

Do large language models judge
facial attractiveness like humans?

Framing

A quantitative investigation of machine psychology

Human judgments of attractiveness are shaped by our evolved psychology and socioecology, and have been well-studied since the 1960s. In contrast, large language models’ perceptions of attractiveness have entirely different origins and are comparatively understudied, with only a handful of investigations in the literature. As large language models (LLMs) become collaborators in clinical and consumer aesthetics, the question of how these models assess attractiveness and how its facial evaluations are similar or different from ours becomes pressing.

To help answer this question, our scientists are running an empirical research study comparing four frontier models against an existing dataset of human ratings of attractiveness. The findings will be published openly alongside our methodology.

Human–AI agreement

How are LLMs similar or different to humans in their perceptions of facial attractiveness, and in what ways?

AI–AI agreement

How similar or different are LLMs to one another in their perceptions of facial attractiveness, and in what ways?

MODELS IN THE STUDY

Four models

Each is queried with the exact same prompt protocol for the same set of faces.

Claude

Anthropic

ChatGPT

OpenAI

Gemini

Google

Grok

xAI

Study design

Research questions, prompting protocol, model selection.

Pre-registration

In progress

Locking-in methods before data is touched.

Data collection

Querying frontier models against the rater corpus.

Analysis

Quantifying agreement, bias surfaces, edge cases.

Publication

Methods, findings, and the full data appendix.

Santiago Grandas Forero

Scientist

Macken Murphy

Chief Scientist

Juan Sebastian Cely

Scientist

Do large language models judge facial attractiveness like humans?

Check your email address

Do large language models judge
facial attractiveness like humans?