Published, methodology-first benchmark evaluations and engagement case studies of AI systems — the same evidence-based process used in client engagements, applied to questions teams actually face before and after deploying AI. Each study includes full methodology, statistical analysis, and limitations. A new study is published every few months.
An assessment-first engagement took a regulated, on-premise AI assistant from 82.2% to a deployed 94.8% answer accuracy on the same single GPU — adding a zero-critical-error auto-accept capability that handled roughly 1,100 answers in its first production month.
A six-stage evaluation across 1,497 real documents found AI extraction beat traditional OCR by 9 percentage points — and a confidence-based auto-accept layer cut document processing time by more than half.
Four rounds of systematic testing — 16,143 evaluations across 40+ configurations — produced a single evidence-based RAG setup, overturning several "best practice" assumptions along the way.