Centre for Statistics in Medicine, University of Oxford, Linton Rd, Oxford, UK.
BMC Med. 2010 Mar 30;8:20. doi: 10.1186/1741-7015-8-20.
Development of prognostic models enables identification of variables that are influential in predicting patient outcome and the use of these multiple risk factors in a systematic, reproducible way according to evidence based methods. The reliability of models depends on informed use of statistical methods, in combination with prior knowledge of disease. We reviewed published articles to assess reporting and methods used to develop new prognostic models in cancer.
We developed a systematic search string and identified articles from PubMed. Forty-seven articles were included that satisfied the following inclusion criteria: published in 2005; aiming to predict patient outcome; presenting new prognostic models in cancer with outcome time to an event and including a combination of at least two separate variables; and analysing data using multivariable analysis suitable for time to event data.
In 47 studies, prospective cohort or randomised controlled trial data were used for model development in only 33% (15) of studies. In 30% (14) of the studies insufficient data were available, having fewer than 10 events per variable (EPV) used in model development. EPV could not be calculated in a further 40% (19) of the studies. The coding of candidate variables was only reported in 68% (32) of the studies. Although use of continuous variables was reported in all studies, only one article reported using recommended methods of retaining all these variables as continuous without categorisation. Statistical methods for selection of variables in the multivariate modelling were often flawed. A method that is not recommended, namely, using statistical significance in univariate analysis as a pre-screening test to select variables for inclusion in the multivariate model, was applied in 48% (21) of the studies.
We found that published prognostic models are often characterised by both use of inappropriate methods for development of multivariable models and poor reporting. In addition, models are limited by the lack of studies based on prospective data of sufficient sample size to avoid overfitting. The use of poor methods compromises the reliability of prognostic models developed to provide objective probability estimates to complement clinical intuition of the physician and guidelines.
预后模型的开发能够识别对预测患者预后有影响的变量,并根据循证方法系统、可重复地使用这些多个危险因素。模型的可靠性取决于对统计方法的明智使用,以及对疾病的先验知识。我们回顾了已发表的文章,以评估癌症新预后模型的报告和方法的使用情况。
我们制定了一个系统的搜索字符串,并从 PubMed 中确定了文章。符合以下纳入标准的 47 篇文章被纳入:发表于 2005 年;旨在预测患者预后;提出新的癌症预后模型,结局时间为事件,并包括至少两个独立变量的组合;并使用适合时间事件数据的多变量分析来分析数据。
在 47 项研究中,前瞻性队列研究或随机对照试验数据仅用于 33%(15 项)研究中的模型开发。在 30%(14 项)的研究中,可用数据不足,用于模型开发的每个变量的事件数少于 10(EPV)。在另外 40%(19 项)的研究中无法计算 EPV。候选变量的编码仅在 68%(32 项)的研究中报告。尽管所有研究都报告了连续变量的使用,但只有一篇文章报告了使用推荐的方法,即将所有这些变量作为连续变量而不进行分类保留。多变量建模中变量选择的统计方法往往存在缺陷。一种不推荐的方法是,在单变量分析中使用统计学意义作为筛选测试,以选择要包含在多变量模型中的变量,这种方法在 48%(21 项)的研究中被应用。
我们发现,已发表的预后模型往往存在两个问题,即多变量模型开发方法不当和报告质量差。此外,由于缺乏基于足够样本量的前瞻性数据的研究,这些模型存在过度拟合的局限性。使用较差的方法会影响预后模型的可靠性,这些模型旨在提供客观的概率估计,以补充医生的临床直觉和指南。