LLM as a judge. Uses a language model to evaluate report quality -- capturing clinical meaning, not just word overlap.
Open-source and lightweight. Runs locally, no proprietary APIs required.
Single composite score. One number per report, replacing the patchwork of 7+ traditional NLP metrics.