Yahya Noorazrul, Ebert Martin A, Bulsara Max, Kennedy Angel, Joseph David J, Denham James W
School of Physics, University of Western Australia, Australia; School of Health Sciences, National University of Malaysia, Malaysia.
School of Physics, University of Western Australia, Australia; Department of Radiation Oncology, Sir Charles Gairdner Hospital, Australia.
Radiother Oncol. 2016 Aug;120(2):339-45. doi: 10.1016/j.radonc.2016.05.010. Epub 2016 Jun 28.
Most predictive models are not sufficiently validated for prospective use. We performed independent external validation of published predictive models for urinary dysfunctions following radiotherapy of the prostate.
MATERIALS/METHODS: Multivariable models developed to predict atomised and generalised urinary symptoms, both acute and late, were considered for validation using a dataset representing 754 participants from the TROG 03.04-RADAR trial. Endpoints and features were harmonised to match the predictive models. The overall performance, calibration and discrimination were assessed.
14 models from four publications were validated. The discrimination of the predictive models in an independent external validation cohort, measured using the area under the receiver operating characteristic (ROC) curve, ranged from 0.473 to 0.695, generally lower than in internal validation. 4 models had ROC >0.6. Shrinkage was required for all predictive models' coefficients ranging from -0.309 (prediction probability was inverse to observed proportion) to 0.823. Predictive models which include baseline symptoms as a feature produced the highest discrimination. Two models produced a predicted probability of 0 and 1 for all patients.
Predictive models vary in performance and transferability illustrating the need for improvements in model development and reporting. Several models showed reasonable potential but efforts should be increased to improve performance. Baseline symptoms should always be considered as potential features for predictive models.
大多数预测模型尚未得到充分验证以供前瞻性使用。我们对已发表的前列腺癌放疗后泌尿功能障碍预测模型进行了独立外部验证。
材料/方法:使用代表TROG 03.04-RADAR试验中754名参与者的数据集,对为预测急性和晚期雾化及全身性泌尿症状而开发的多变量模型进行验证。对终点和特征进行协调以匹配预测模型。评估总体性能、校准和辨别力。
对来自四份出版物的14个模型进行了验证。在独立外部验证队列中,预测模型的辨别力(使用受试者工作特征曲线下面积测量)范围为0.473至0.695,通常低于内部验证。4个模型的受试者工作特征曲线下面积>0.6。所有预测模型系数的收缩范围为-0.309(预测概率与观察比例成反比)至0.823。将基线症状作为特征的预测模型产生了最高的辨别力。两个模型对所有患者的预测概率均为0和1。
预测模型在性能和可转移性方面存在差异,这表明需要改进模型开发和报告。几个模型显示出合理的潜力,但应加大努力以提高性能。基线症状应始终被视为预测模型的潜在特征。