2026-05-10

How We Test the AI Judge

Every argument submitted to Invato is scored on four axes: logical soundness, evidence quality, relevance, and persuasiveness.

We benchmark the judge against a panel of human raters across thousands of historical debates, looking for places the model drifts from human judgment.

When the model and the panel disagree, we publish the case in the open archive so anyone can audit our reasoning.