Ameye L, Valentin L, Testa A C, Van Holsbeke C, Domali E, Van Huffel S, Vergote I, Bourne T, Timmerman D
Department of Electrical Engineering (ESAT-SCD), University Hospital, Katholieke Universiteit Leuven, Leuven, Belgium.
Ultrasound Obstet Gynecol. 2009 Jan;33(1):92-101. doi: 10.1002/uog.6273.
To investigate if the prediction of malignant adnexal masses can be improved by considering different ultrasound-based subgroups of tumors and constructing a scoring system for each subgroup instead of using a risk estimation model applicable to all tumors.
We used a multicenter database of 1573 patients with at least one persistent adnexal mass. The masses were categorized into four subgroups based on their ultrasound appearance: (1) unilocular cyst; (2) multilocular cyst; (3) presence of a solid component but no papillation; and (4) presence of papillation. For each of the four subgroups a scoring system to predict malignancy was developed in a development set consisting of 754 patients in total (respective numbers of patients: (1) 228; (2) 143; (3) 183; and (4) 200). The subgroup scoring system was then tested in 312 patients and prospectively validated in 507 patients. The sensitivity and specificity, with regard to the prediction of malignancy, of the scoring system were compared with that of the subjective evaluation of ultrasound images by an experienced examiner (pattern recognition) and with that of a published logistic regression (LR) model for the calculation of risk of malignancy in adnexal masses. The gold standard was the pathological classification of the mass as benign or malignant (borderline, primary invasive, or metastatic).
In the prospective validation set, the sensitivity of pattern recognition, the LR model and the subgroup scoring system was 90% (129/143), 95% (136/143) and 88% (126/143), respectively, and the specificity was 93% (338/364), 74% (270/364) and 90% (329/364), respectively.
In the hands of experienced ultrasound examiners, the subgroup scoring system for diagnosing malignancy has a performance that is similar to that of pattern recognition, the latter method being the best diagnostic method currently available. The scoring system is less sensitive but more specific than the LR model.
研究通过考虑基于超声的不同肿瘤亚组并为每个亚组构建评分系统,而非使用适用于所有肿瘤的风险估计模型,是否能提高对恶性附件包块的预测能力。
我们使用了一个包含1573例至少有一个持续性附件包块患者的多中心数据库。根据超声表现将包块分为四个亚组:(1)单房囊肿;(2)多房囊肿;(3)有实性成分但无乳头样结构;(4)有乳头样结构。在一个由754例患者组成的开发集中(各亚组患者数量分别为:(1)228例;(2)143例;(3)183例;(4)200例),为这四个亚组分别开发了一个预测恶性肿瘤的评分系统。然后在312例患者中对亚组评分系统进行测试,并在507例患者中进行前瞻性验证。将评分系统在预测恶性肿瘤方面的敏感性和特异性,与经验丰富的检查者对超声图像的主观评估(模式识别)以及已发表的用于计算附件包块恶性风险的逻辑回归(LR)模型的敏感性和特异性进行比较。金标准是将包块病理分类为良性或恶性(交界性、原发性浸润性或转移性)。
在前瞻性验证集中,模式识别、LR模型和亚组评分系统的敏感性分别为90%(129/143)、95%(136/143)和88%(126/143),特异性分别为93%(338/364)、74%(270/364)和90%(329/364)。
在经验丰富的超声检查者手中,用于诊断恶性肿瘤的亚组评分系统的性能与模式识别相似,模式识别是目前可用的最佳诊断方法。该评分系统比LR模型敏感性低但特异性高。