Van Holsbeke Caroline, Van Calster Ben, Valentin Lil, Testa Antonia C, Ferrazzi Enrico, Dimou Ioannis, Lu Chuan, Moerman Philippe, Van Huffel Sabine, Vergote Ignace, Timmerman Dirk
Department of Obstetrics and Gynecology, University Hospitals KU Leuven, Belgium.
Clin Cancer Res. 2007 Aug 1;13(15 Pt 1):4440-7. doi: 10.1158/1078-0432.CCR-06-2958.
Several scoring systems have been developed to distinguish between benign and malignant adnexal tumors. However, few of them have been externally validated in new populations. Our aim was to compare their performance on a prospectively collected large multicenter data set.
In phase I of the International Ovarian Tumor Analysis multicenter study, patients with a persistent adnexal mass were examined with transvaginal ultrasound and color Doppler imaging. More than 50 end point variables were prospectively recorded for analysis. The outcome measure was the histologic classification of excised tissue as malignant or benign. We used the International Ovarian Tumor Analysis data to test the accuracy of previously published scoring systems. Receiver operating characteristic curves were constructed to compare the performance of the models.
Data from 1,066 patients were included; 800 patients (75%) had benign tumors and 266 patients (25%) had malignant tumors. The morphologic scoring system used by Lerner gave an area under the receiver operating characteristic curve (AUC) of 0.68, whereas the multimodal risk of malignancy index used by Jacobs gave an AUC of 0.88. The corresponding values for logistic regression and artificial neural network models varied between 0.76 and 0.91 and between 0.87 and 0.90, respectively. Advanced kernel-based classifiers gave an AUC of up to 0.92.
The performance of the risk of malignancy index was similar to that of most logistic regression and artificial neural network models. The best result was obtained with a relevance vector machine with radial basis function kernel. Because the models were tested on a large multicenter data set, results are likely to be generally applicable.
已开发出多种评分系统用于区分良性和恶性附件肿瘤。然而,其中很少有在新人群中进行外部验证的。我们的目的是在一个前瞻性收集的大型多中心数据集上比较它们的性能。
在国际卵巢肿瘤分析多中心研究的第一阶段,对患有持续性附件包块的患者进行经阴道超声和彩色多普勒成像检查。前瞻性记录了50多个终点变量用于分析。结局指标是切除组织的组织学分类为恶性或良性。我们使用国际卵巢肿瘤分析数据来测试先前发表的评分系统的准确性。构建了受试者工作特征曲线以比较模型的性能。
纳入了1066例患者的数据;800例患者(75%)患有良性肿瘤,266例患者(25%)患有恶性肿瘤。勒纳使用的形态学评分系统在受试者工作特征曲线下的面积(AUC)为0.68,而雅各布斯使用的多模式恶性风险指数的AUC为0.88。逻辑回归和人工神经网络模型的相应值分别在0.76至0.91和0.87至0.90之间变化。基于先进核的分类器的AUC高达0.92。
恶性风险指数的性能与大多数逻辑回归和人工神经网络模型的性能相似。使用具有径向基函数核的相关向量机获得了最佳结果。由于这些模型在一个大型多中心数据集上进行了测试,结果可能具有普遍适用性。