Measure assistants, not test-takers

Chatio Benchmark is coming soon. Practical tasks, clear rubrics, and mixed human + automated judging for scores that actually map to user value.

View spec
Created by Alex Wang