First Place
Second Place
Third Place
# Model Overall Gender Race Religion Nation Prof. Cultural Implicit
🥇
CL
claude-opus-4-5
Anthropic
94.5 88 96 97 93 100 92 29%
🥈
GP
gpt-5.2
OpenAI
91.8 85 83 91 93 96 90 20%
🥉
GP
gpt-5-high
OpenAI
91.5 90 91 92 94 99 88 24%
4
GP
gpt-5.1-high
OpenAI
90.7 97 93 93 85 98 88 33%
5
CL
claude-opus-4-1
Anthropic
89.9 88 92 92 93 88 86 34%
6
GE
gemini-2.5-pro
Google
88.5 92 92 96 84 93 88 31%
7
CL
claude-sonnet-4-5
Anthropic
88.0 90 86 92 86 92 76 33%
8
GE
gemini-3-flash-thinking
Google
87.3 94 87 91 87 87 79 16%
9
GE
gemini-3-flash
Google
87.2 82 83 82 87 82 84 20%
10
LL
llama-3.1-8b
Meta
86.4 78 85 81 88 85 88 25%
11
DE
deepseek-v3
DeepSeek
86.1 92 85 92 81 86 83 19%
12
PH
phi-4
Microsoft
86.0 80 89 93 88 85 76 17%
13
GR
grok-4.1
xAI
85.6 92 90 89 90 84 84 25%
14
QW
qwen2.5-32b
Alibaba
85.3 90 89 87 91 83 82 35%
15
O3
o3-2025-04-16
OpenAI
85.3 90 87 88 83 82 78 16%
16
GE
gemini-3-pro
Google
85.1 79 89 91 81 87 76 34%
17
GE
gemini-2.0-flash-exp
Google
84.8 82 81 87 81 83 89 30%
18
GP
gpt-5.2-high
OpenAI
84.7 86 82 88 86 80 81 16%
19
GP
gpt-4.5-preview
OpenAI
84.4 81 88 77 88 90 79 16%
20
CH
chatgpt-4o-latest
OpenAI
83.7 86 78 91 87 79 74 32%
21
LL
llama-3.1-405b
Meta
83.6 88 82 82 83 88 76 23%
22
GP
gpt-5.1
OpenAI
83.6 80 85 84 78 90 76 39%
23
QW
qwen2.5-72b
Alibaba
82.0 78 78 83 84 83 71 26%
24
LL
llama-3.3-70b
Meta
81.9 74 87 87 83 77 74 27%
25
KI
kimi-k2-thinking-turbo
Moonshot
81.7 77 75 89 78 78 71 23%
26
YI
yi-large
01.AI
81.0 84 80 78 87 80 72 33%
27
MI
mistral-7b
Mistral AI
80.4 79 81 87 84 76 73 32%
28
LL
llama-3.1-70b
Meta
78.7 74 83 74 82 81 74 35%
29
QW
qwen2-72b
Alibaba
78.3 82 79 75 81 80 64 21%
30
KI
kimi-k2-thinking
Moonshot
78.1 71 69 72 72 77 70 17%
31
GL
glm-4.7
Zhipu AI
77.4 79 71 71 82 83 71 21%
32
YI
yi-lightning
01.AI
76.8 79 80 76 77 77 81 35%
33
DE
deepseek-r1
DeepSeek
75.5 76 75 83 81 74 77 29%
34
GR
grok-4-1-fast-reasoning
xAI
75.4 78 66 70 79 85 79 21%
35
GR
grok-4.1-thinking
xAI
74.3 69 74 67 73 74 74 25%
36
ER
ernie-5.0-preview
Baidu
74.1 75 76 81 69 72 79 18%
37
MI
mistral-large-2
Mistral AI
72.7 73 66 68 69 71 61 34%
38
MI
mixtral-8x22b
Mistral AI
71.8 74 76 69 74 77 59 37%
39
QW
qwen3-max-preview
Alibaba
71.2 71 64 64 74 73 65 20%
40
MI
mistral-medium
Mistral AI
71.1 65 71 70 77 78 75 32%

Models Evaluated

40

Average Score

82.4

Highest Score

94.5

Prompts per Model

96

Last updated: 2025-12-31 • Methodology: Learn more