PathAI, Boston, MA, USA.
Absci, Vancouver, WA, USA.
Nat Med. 2024 Oct;30(10):2914-2923. doi: 10.1038/s41591-024-03172-7. Epub 2024 Aug 7.
Clinical trials in metabolic dysfunction-associated steatohepatitis (MASH, formerly known as nonalcoholic steatohepatitis) require histologic scoring for assessment of inclusion criteria and endpoints. However, variability in interpretation has impacted clinical trial outcomes. We developed an artificial intelligence-based measurement (AIM) tool for scoring MASH histology (AIM-MASH). AIM-MASH predictions for MASH Clinical Research Network necroinflammation grades and fibrosis stages were reproducible (κ = 1) and aligned with expert pathologist consensus scores (κ = 0.62-0.74). The AIM-MASH versus consensus agreements were comparable to average pathologists for MASH Clinical Research Network scores (82% versus 81%) and fibrosis (97% versus 96%). Continuous scores produced by AIM-MASH for key histological features of MASH correlated with mean pathologist scores and noninvasive biomarkers and strongly predicted progression-free survival in patients with stage 3 (P < 0.0001) and stage 4 (P = 0.03) fibrosis. In a retrospective analysis of the ATLAS trial (NCT03449446), responders receiving study treatment showed a greater continuous change in fibrosis compared with placebo (P = 0.02). Overall, these results suggest that AIM-MASH may assist pathologists in histologic review of MASH clinical trials, reducing inter-rater variability on trial outcomes and offering a more sensitive and reproducible measure of patient responses.
代谢功能障碍相关脂肪性肝炎(MASH,以前称为非酒精性脂肪性肝炎)的临床试验需要进行组织学评分,以评估纳入标准和终点。然而,解释的变异性影响了临床试验的结果。我们开发了一种基于人工智能的 MASH 组织学评分测量工具(AIM-MASH)。AIM-MASH 对 MASH 临床研究网络坏死性炎症分级和纤维化分期的预测具有可重复性(κ=1),并与专家病理学家共识评分一致(κ=0.62-0.74)。AIM-MASH 与共识的一致性与 MASH 临床研究网络评分的平均病理学家相当(82%对 81%)和纤维化(97%对 96%)。AIM-MASH 对 MASH 关键组织学特征的连续评分与平均病理学家评分和非侵入性生物标志物相关,并强烈预测 3 期(P<0.0001)和 4 期(P=0.03)纤维化患者的无进展生存期。在 ATLAS 试验的回顾性分析中(NCT03449446),接受研究治疗的应答者与安慰剂相比,纤维化的连续变化更大(P=0.02)。总体而言,这些结果表明,AIM-MASH 可能有助于病理学家对 MASH 临床试验的组织学审查,减少试验结果的评分者间变异性,并提供更敏感和可重复的患者反应衡量标准。