Measuring what actually matters.

The top-ranked model on a leaderboard can often feel unhelpful because it was graded on solving problems you'd never ask. While acing coding challenges is impressive, it doesn't guarantee the model can listen, understand your needs, or help with practical, day-to-day tasks.

That's why we measure the core skills of a great assistant: empathy, creativity, comprehension, and instruction following. Our rankings reflect a model's true usability, not just its academic prowess.

LMArena

#1
Gemini 3 Pro
gemini-3-pro
#2
Grok 4.1 Thinking
grok-4-1-thinking
#3
Grok 4.1
grok-4-1
#4
Gemini 2.5 Pro
gemini-2-5-pro
#5
Claude Sonnet 4.0-20240229-Thinking-32k
claude-sonnet-4-0-2024...

Chatio

#1
Gemini 2.5 Pro
gemini-2.5-pro
#2
Claude Opus 4.1
claude-3-opus-20240229
#3
GPT-5
gpt-5
#4
o3
o3
#5
ChatGPT 4o
gpt-4o