van der Ploeg Tjeerd, Austin Peter C, Steyerberg Ewout W
Department of Science, Medical Center Alkmaar/Inholland University, Alkmaar, The Netherlands.
BMC Med Res Methodol. 2014 Dec 22;14:137. doi: 10.1186/1471-2288-14-137.
Modern modelling techniques may potentially provide more accurate predictions of binary outcomes than classical techniques. We aimed to study the predictive performance of different modelling techniques in relation to the effective sample size ("data hungriness").
We performed simulation studies based on three clinical cohorts: 1282 patients with head and neck cancer (with 46.9% 5 year survival), 1731 patients with traumatic brain injury (22.3% 6 month mortality) and 3181 patients with minor head injury (7.6% with CT scan abnormalities). We compared three relatively modern modelling techniques: support vector machines (SVM), neural nets (NN), and random forests (RF) and two classical techniques: logistic regression (LR) and classification and regression trees (CART). We created three large artificial databases with 20 fold, 10 fold and 6 fold replication of subjects, where we generated dichotomous outcomes according to different underlying models. We applied each modelling technique to increasingly larger development parts (100 repetitions). The area under the ROC-curve (AUC) indicated the performance of each model in the development part and in an independent validation part. Data hungriness was defined by plateauing of AUC and small optimism (difference between the mean apparent AUC and the mean validated AUC <0.01).
We found that a stable AUC was reached by LR at approximately 20 to 50 events per variable, followed by CART, SVM, NN and RF models. Optimism decreased with increasing sample sizes and the same ranking of techniques. The RF, SVM and NN models showed instability and a high optimism even with >200 events per variable.
Modern modelling techniques such as SVM, NN and RF may need over 10 times as many events per variable to achieve a stable AUC and a small optimism than classical modelling techniques such as LR. This implies that such modern techniques should only be used in medical prediction problems if very large data sets are available.
与传统技术相比,现代建模技术可能会对二元结局做出更准确的预测。我们旨在研究不同建模技术在有效样本量(“数据饥渴度”)方面的预测性能。
我们基于三个临床队列进行了模拟研究:1282例头颈癌患者(5年生存率为46.9%)、1731例创伤性脑损伤患者(6个月死亡率为22.3%)和3181例轻度头部损伤患者(7.6%有CT扫描异常)。我们比较了三种相对现代的建模技术:支持向量机(SVM)、神经网络(NN)和随机森林(RF),以及两种传统技术:逻辑回归(LR)和分类与回归树(CART)。我们创建了三个大型人工数据库,其中受试者重复20倍、10倍和6倍,在这些数据库中,我们根据不同的基础模型生成二分结局。我们将每种建模技术应用于越来越大的开发部分(100次重复)。ROC曲线下面积(AUC)表明了每个模型在开发部分和独立验证部分的性能。数据饥渴度由AUC的平稳状态和较小的乐观度(平均表观AUC与平均验证AUC之间的差异<0.01)定义。
我们发现,LR在每个变量约20至50个事件时达到稳定的AUC,其次是CART、SVM、NN和RF模型。乐观度随着样本量的增加而降低,技术排名相同。即使每个变量>200个事件,RF、SVM和NN模型仍表现出不稳定性和较高的乐观度。
与LR等传统建模技术相比,SVM、NN和RF等现代建模技术可能需要每个变量10倍以上的事件才能实现稳定的AUC和较小的乐观度。这意味着,只有在有非常大的数据集时,此类现代技术才应应用于医学预测问题。