15 Apr 2026
From One Judge to a Learning System
From one judge to a full eval loop: layers, timelines (useful signal in ~8-10 weeks / 4-5 two-week sprints; mature in-house stack often 6+ months), build vs integrate, and how we measure quality (~71% baseline judge → up to 83.6% on our benchmark).
Michael Karotsieris