Abdurrachim Desiree, Lek Serene, Ong Charlene Zhi Lin, Wong Chun Kit, Zhou Yongqi, Wee Aileen, Soon Gwyneth, Kendall Timothy J, Idowu Michael O, Hendra Christopher, Saigal Ashmita, Krishnan Radha, Chng Elaine, Tai Dean, Ho Gideon, Forest Thomas, Raji Annaswamy, Talukdar Saswata, Chin Chih-Liang, Baumgartner Richard, Engel Samuel S, Ali Asad Abu Bakar, Kleiner David E, Sanyal Arun J
Quantitative Biosciences, MSD, Singapore.
HistoIndex Pte. Ltd., Singapore.
J Hepatol. 2025 May;82(5):898-908. doi: 10.1016/j.jhep.2024.11.032. Epub 2024 Nov 28.
BACKGROUND & AIMS: Intra and inter-pathologist variability poses a significant challenge in metabolic dysfunction-associated steatohepatitis (MASH) biopsy evaluation, leading to suboptimal selection of patients and confounded assessment of histological response in clinical trials. We evaluated the utility of an artificial intelligence (AI) digital pathology (DP) platform to help pathologists improve the reliability of fibrosis staging.
A total of 120 digitized histology slides from two trials (NCT03517540, NCT03912532) were analyzed by four expert hepatopathologists, with and without AI assistance in a randomized, crossover design. We utilized an AI DP platform consisting of unstained second harmonic generation/two photon excitation fluorescence (SHG/TPEF) images and AI quantitative fibrosis (qFibrosis) values.
AI assistance significantly improved inter-pathologist kappa for fibrosis staging, particularly for early fibrosis (F0-F2), with reduced variance around the median reads. Intra-pathologist kappa was unchanged. AI assistance increased pathologist concordance for identifying clinical trial inclusion cases (F2-F3) from 45% to 71%, exclusion cases (F0/F1/F4) from 38% to 55%, and evaluation of fibrosis response to treatment from 49% to 61%. SHG/TPEF images, qFibrosis continuous values, and qFibrosis stage were considered useful by at least three out of four pathologists in 83%, 55%, and 38% of cases, respectively. In the context of a clinical trial, the increase in inter-pathologist concordance was modeled to result in a ∼25% reduction in the potential need for adjudication as well as a ∼45% increase in the study power for a kappa improvement from ∼0.4 to ∼0.7.
The use of AI DP enhances inter-rater reliability of fibrosis staging for MASH. This indicates that the SHG/TPEF-based AI DP tool is useful for assisting pathologists in assessing fibrosis, thereby enhancing clinical trial efficiency and reliability of fibrosis readouts in response to treatments.
Implementing an AI DP platform as a tool for pathologists significantly improved inter-pathologist agreement on fibrosis staging, particularly for early-stage fibrosis (F0-F2), which is critical for clinical trial eligibility. The second harmonic generation imaging technology used in conjunction with AI quantitative scores provided enhanced visualization of fibrosis with an indication of severity along the disease continuum. This led to increased pathologist confidence in fibrosis staging and, therefore, increased pathologist concordance for the classification of clinical trial inclusion/exclusion and evaluation of treatment, compared to a standard scoring method based on traditional stains without AI assistance. Improved pathologist concordance with AI assistance could streamline clinical trial processes, reducing the need for adjudication and enhancing study power, potentially decreasing required sample sizes. Continued exploration of the utility of AI assistance across a broader range of pathologists and in prospective clinical trials will be essential for validating the effectiveness of AI assistance.
病理学家之间以及病理学家内部的变异性给代谢功能障碍相关脂肪性肝炎(MASH)活检评估带来了重大挑战,导致患者选择欠佳,且在临床试验中对组织学反应的评估存在混淆。我们评估了人工智能(AI)数字病理学(DP)平台在帮助病理学家提高纤维化分期可靠性方面的效用。
来自两项试验(NCT03517540、NCT03912532)的总共120张数字化组织学切片由四位肝脏病理专家进行分析,采用随机交叉设计,分别在有和没有AI辅助的情况下进行。我们使用了一个由未染色的二次谐波产生/双光子激发荧光(SHG/TPEF)图像和AI定量纤维化(qFibrosis)值组成的AI DP平台。
AI辅助显著提高了病理学家之间纤维化分期的kappa值,尤其是对于早期纤维化(F0 - F2),中位数读数周围的方差减小。病理学家内部的kappa值未变。AI辅助使病理学家在识别临床试验纳入病例(F2 - F3)方面的一致性从45%提高到71%,在识别排除病例(F0/F1/F4)方面的一致性从38%提高到55%,在评估纤维化对治疗的反应方面的一致性从49%提高到61%。在83%、55%和38%的病例中,至少四位病理学家中有三位认为SHG/TPEF图像、qFibrosis连续值和qFibrosis分期是有用的。在临床试验的背景下,病理学家之间一致性的提高被模拟为导致潜在裁决需求减少约25%,以及研究效能提高约45%,kappa值从约0.4提高到约0.7。
使用AI DP可提高MASH纤维化分期的评分者间可靠性。这表明基于SHG/TPEF的AI DP工具有助于病理学家评估纤维化,从而提高临床试验效率和纤维化读数对治疗反应的可靠性。
将AI DP平台作为病理学家的工具实施,显著提高了病理学家在纤维化分期方面的一致性,特别是对于早期纤维化(F0 - F2),这对临床试验资格至关重要。与未使用AI辅助的基于传统染色的标准评分方法相比,结合AI定量评分使用的二次谐波产生成像技术增强了纤维化的可视化,并显示了疾病连续过程中的严重程度。这提高了病理学家对纤维化分期的信心,因此在临床试验纳入/排除分类和治疗评估方面,病理学家之间的一致性增加。在AI辅助下病理学家一致性的提高可以简化临床试验过程,减少裁决需求并提高研究效能,可能减少所需样本量。在更广泛的病理学家群体中以及在前瞻性临床试验中继续探索AI辅助的效用对于验证AI辅助的有效性至关重要。