Abou-Foul Ahmad K, Dretzke Janine, Albon Esther, Kristunas Caroline, Moore David J, Karwath Andreas, Gkoutos Georgios, Mehanna Hisham, Nankivell Paul
Institute for Head and Neck Studies and Education, University of Birmingham, Birmingham, United Kingdom.
Department of Cancer and Genomic Sciences & Centre for Health Data Science, University of Birmingham, Birmingham, United Kingdom.
Front Oncol. 2024 Dec 6;14:1478385. doi: 10.3389/fonc.2024.1478385. eCollection 2024.
The limitations of the traditional TNM system have spurred interest in multivariable models for personalized prognostication in laryngeal and hypopharyngeal cancers (LSCC/HPSCC). However, the performance of these models depends on the quality of data and modelling methodology, affecting their potential for clinical adoption. This systematic review and meta-analysis (SR-MA) evaluated clinical predictive models (CPMs) for recurrence and survival in treated LSCC/HPSCC. We assessed models' characteristics and methodologies, as well as performance, risk of bias (RoB), and applicability.
Literature searches were conducted in MEDLINE (OVID), Embase (OVID) and IEEE databases from January 2005 to November 2023. The search algorithm used comprehensive text word and index term combinations without language or publication type restrictions. Independent reviewers screened titles and abstracts using a predefined Population, Index, Comparator, Outcomes, Timing and Setting (PICOTS) framework. We included externally validated (EV) multivariable models, with at least one clinical predictor, that provided recurrence or survival predictions. The SR-MA followed PRISMA reporting guidelines, and PROBAST framework for RoB assessment. Model discrimination was assessed using C-index/AUC, and was presented for all models using forest plots. MA was only performed for models that were externally validated in two or more cohorts, using random-effects model. The main outcomes were model discrimination and calibration measures for survival (OS) and/or local recurrence (LR) prediction. All measures and assessments were preplanned prior to data collection.
The SR-MA identified 11 models, reported in 16 studies. Seven models for OS showed good discrimination on development, with only one excelling (C-index >0.9), and three had weak or poor discrimination. Inclusion of a radiomics score as a model parameter achieved relatively better performance. Most models had poor generalisability, demonstrated by worse discrimination performance on EV, but they still outperformed the TNM system. Only two models met the criteria for MA, with pooled EV AUCs 0.73 (95% CI 0.71-0.76) and 0.67 (95% CI 0.6-0.74). RoB was high for all models, particularly in the analysis domain.
This review highlighted the shortcomings of currently available models, while emphasizing the need for rigorous independent evaluations. Despite the proliferation of models, most exhibited methodological limitations and bias. Currently, no models can confidently be recommended for routine clinical use.
https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42021248762, identifier CRD42021248762.
传统TNM系统的局限性激发了人们对用于喉癌和下咽癌(LSCC/HPSCC)个性化预后评估的多变量模型的兴趣。然而,这些模型的性能取决于数据质量和建模方法,影响了它们在临床应用中的潜力。本系统评价和荟萃分析(SR-MA)评估了治疗LSCC/HPSCC患者复发和生存的临床预测模型(CPM)。我们评估了模型的特征、方法、性能、偏倚风险(RoB)和适用性。
于2005年1月至2023年11月在MEDLINE(OVID)、Embase(OVID)和IEEE数据库中进行文献检索。检索算法使用了全面的文本词和索引词组合,无语言或出版物类型限制。独立评审员使用预定义的人群、索引、对照、结局、时间和设置(PICOTS)框架筛选标题和摘要。我们纳入了经过外部验证(EV)的多变量模型,这些模型至少有一个临床预测因子,并提供复发或生存预测。SR-MA遵循PRISMA报告指南和用于RoB评估的PROBAST框架。使用C指数/AUC评估模型辨别力,并使用森林图展示所有模型的结果。仅对在两个或更多队列中经过外部验证的模型进行荟萃分析,采用随机效应模型。主要结局是生存(OS)和/或局部复发(LR)预测的模型辨别力和校准指标。所有测量和评估在数据收集前预先计划。
SR-MA识别出11个模型,发表于16项研究中。7个OS模型在开发队列中显示出良好的辨别力,只有1个表现出色(C指数>0.9),3个辨别力较弱或较差。将放射组学评分作为模型参数纳入可获得相对更好的性能。大多数模型的泛化能力较差,在EV队列中的辨别性能更差,但仍优于TNM系统。只有两个模型符合荟萃分析标准,汇总的EV AUC分别为0.73(95%CI 0.71-0.76)和0.67(95%CI 0.6-0.74)。所有模型的RoB都很高,尤其是在分析领域。
本综述强调了现有模型的缺点,同时强调了进行严格独立评估的必要性。尽管模型数量众多,但大多数模型都存在方法学局限性和偏倚。目前,没有模型可以放心推荐用于常规临床应用。
https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42021248762,标识符CRD42021248762