Vidal Joan Martínez, Tsiknakis Nikos, Staaf Johan, Bosch Ana, Ehinger Anna, Nimeus Emma, Salgado Roberto, Bai Yalai, Rimm David L, Hartman Johan, Acs Balazs
Department of Oncology and Pathology, Karolinska Institutet, Stockholm, Sweden.
Division of Oncology, Department of Clinical Sciences Lund, Lund University, Medicon Village, SE-22381, Lund, Sweden.
EClinicalMedicine. 2024 Nov 15;78:102928. doi: 10.1016/j.eclinm.2024.102928. eCollection 2024 Dec.
Pathologist-read tumor-infiltrating lymphocytes (TILs) have showcased their predictive and prognostic potential for early and metastatic triple-negative breast cancer (TNBC) but it is still subject to variability. Artificial intelligence (AI) is a promising approach toward eliminating variability and objectively automating TILs assessment. However, demonstrating robust analytical and prognostic validity is the key challenge currently preventing their integration into clinical workflows.
We evaluated the impact of ten AI models on TILs scoring, emphasizing their distinctions in TILs analytical and prognostic validity. Several AI-based TILs scoring models (seven developed and three previously validated AI models) were tested in a retrospective analytical cohort and in an independent prospective cohort to compare prognostic validation against invasive disease-free survival endpoint with 4 years median follow-up. The development and analytical validity set consisted of diagnostic tissue slides of 79 women with surgically resected primary invasive TNBC tumors diagnosed between 2012 and 2016 from the Yale School of Medicine. An independent set comprising of 215 TNBC patients from Sweden diagnosed between 2010 and 2015, was used for testing prognostic validity.
A significant difference in analytical validity (Spearman's r = 0.63-0.73, p < 0.001) is highlighted across AI methodologies and training strategies. Interestingly, the prognostic performance of digital TILs is demonstrated for eight out of ten AI models, even less extensively trained ones, with similar and overlapping hazard ratios (HR) in the external validation cohort (Cox regression analysis based on IDFS-endpoint, HR = 0.40-0.47; p < 0.004).
The demonstrated prognostic validity for most of the AI TIL models can be attributed to the intrinsic robustness of host anti-tumor immunity (measured by TILs) as a biomarker. However, the discrepancies between AI models should not be overlooked; rather, we believe that there is a critical need for an accessible, large, multi-centric dataset that will serve as a benchmark ensuring the comparability and reliability of different AI tools in clinical implementation.
Nikos Tsiknakis is supported by the Swedish Research Council (Grant Number 2021-03061, Theodoros Foukakis). Balazs Acs is supported by The Swedish Society for Medical Research (Svenska Sällskapet för Medicinsk Forskning) postdoctoral grant. Roberto Salgado is supported by a grant from Breast Cancer Research Foundation (BCRF).
病理学家解读的肿瘤浸润淋巴细胞(TILs)已显示出其对早期和转移性三阴性乳腺癌(TNBC)的预测和预后潜力,但仍存在变异性。人工智能(AI)是一种有前景的方法,可消除变异性并客观地实现TILs评估的自动化。然而,证明强大的分析和预后有效性是目前阻碍其融入临床工作流程的关键挑战。
我们评估了10种人工智能模型对TILs评分的影响,强调它们在TILs分析和预后有效性方面的差异。在一个回顾性分析队列和一个独立的前瞻性队列中测试了几种基于人工智能的TILs评分模型(7种自行开发的和3种先前验证过的人工智能模型),以比较针对无侵袭性疾病生存终点的预后验证,中位随访时间为4年。开发和分析有效性数据集由2012年至2016年间从耶鲁医学院手术切除的原发性浸润性TNBC肿瘤的79名女性的诊断组织切片组成。一个由2010年至2015年间在瑞典诊断的215名TNBC患者组成的独立数据集用于测试预后有效性。
不同人工智能方法和训练策略在分析有效性方面存在显著差异(斯皮尔曼相关系数r = 0.63 - 0.73,p < 0.001)。有趣的是,在外部验证队列中,10种人工智能模型中有8种展示了数字TILs的预后性能,即使是训练较少的模型,其风险比(HR)相似且有重叠(基于无侵袭性疾病生存终点的Cox回归分析,HR = 0.40 - 0.47;p < 0.004)。
大多数人工智能TIL模型所展示的预后有效性可归因于宿主抗肿瘤免疫(以TILs衡量)作为生物标志物的内在稳健性。然而,人工智能模型之间的差异不应被忽视;相反,我们认为迫切需要一个可获取的、大型的、多中心的数据集,作为确保不同人工智能工具在临床应用中的可比性和可靠性的基准。
尼科斯·齐克纳基斯得到瑞典研究委员会(资助编号2021 - 03061,西奥多罗斯·福卡基斯)的支持。巴拉兹·阿克斯得到瑞典医学研究协会博士后资助。罗伯托·萨尔加多得到乳腺癌研究基金会(BCRF)的资助。