Verification is not correctness. Measure who on your team can catch what legal AI got wrong. Can your lawyers catch the AI's fabricated authority? A court-grounded competency drill.
Counsel Check measures whether a firm's lawyers can catch the ways legal AI fails. Reviewers work through reconstructed filings a skeleton argument, written submissions, a witness statement, an advice note drawn from real UK matters where AI-fabricated or misapplied authorities reached a court or tribunal (Ayinde, Harber, Al-Haroun) and the lawyers were referred to their regulators. They highlight, flag, and comment on suspect passages; a live citation grader checks each authority against UK government records (legislation.gov.uk and the National Archives) in real time.
What makes it different is the ground truth. There is no LLM in the grading every failure and answer is anchored in the published judgments. The engine scores on a failure-mode taxonomy (we call the underlying phenomenon context divergence, not "hallucination"): fabricated authority, plausible-citation camouflage, quotation fabrication, misapplied ratio, jurisdiction bleed, and more. The result is a personal scorecard (calibrated vs. over-trusting vs. over-flagging) and a firm-level gap view that turns "should we widen AI access?" into an evidence-based decision.
This is an assessment, not a certification this tells a firm where to train, not whether a lawyer is "qualified."
Demo build: no login, nothing stored; annotations live only in the browser tab and the grader never receives document text.