Loizzi Vera, Comes Maria Colomba, Arezzo Francesca, Apostol Adriana Ionelia, Bove Samantha, Fanizzi Annarita, Fruscio Robert, Gregorc Vanesa, Legge Francesco, Mancari Rosanna, Marchetti Claudia, Negri Serena, Russo Giorgia, Vertechy Laura, Scambia Giovanni, Massafra Raffaella, Cormio Gennaro
S.S.D. Ginecologia Oncologica Clinicizzata, IRCCS Istituto Tumori Giovanni Paolo II, Bari, Italy.
Dipartimento di Biomedicina Traslazionale e Neuroscienze (DiBraiN), University of Bari Aldo Moro, Bari, Italy.
Front Oncol. 2025 Apr 15;15:1574037. doi: 10.3389/fonc.2025.1574037. eCollection 2025.
-mutated women are recommended to undergo bilateral risk-reducing salpingo-oophorectomy (RRSO) after childbearing, due to the lack of effective methods that could be able to early detect the occurrence of ovarian cancer. Thus, predictive machine learning (ML) techniques could be crucial to aid clinicians in identifying high-risk -mutated patients and determining the appropriate timing for performing RRSO.
In this work, we addressed this task by developing explainable ML models using clinical data referred to a multicentric cohort of 694 -mutated patients from six Italian centers (Policlinico Gemelli, IRCCS San Gerardo, Policlinico Bari, Istituto Tumori Regina Elena, Istituto Tumori Giovanni Paolo II, Ospedale F. Miulli), who performed salpingo-oophorectomy, out of which 39 patients showed tumor (5.6%). Data from Istituto Tumori Regina Elena and Policlinico Bari were used as External Validation Cohort (EVC). The other data were employed as Investigational Cohort (IC). Resampling and ensemble techniques were implemented to handle dataset imbalance. Explainable techniques enabled us to identify some protective and risk factors predicted by the models with respect to the task under study.
The best ML model achieved an AUC value of 79.3% (95% CI: 75.3% - 83.0%), an accuracy value of 73.8% (95% CI: 69.6% - 78.2%), a sensitivity value of 66.7% (95% CI: 58.1% - 75.3%), a specificity value of 74.3% (95% CI: 68.7% - 80.0%), and a G-mean value of 70.4% (95% CI: 63.0% - 76.0%) on EVC. Although the model demonstrated good overall performance, its limited sensitivity reduces its effectiveness in this high-risk population. The variables CA125, age and MatoRRSO were found to be the most significant risk factors, in agreement with the clinical perspective. Conversely, variables such as Estroprogestinuse and PregnancyNfdt played a protective factor role.
Our ML proposal explores the intricate relationships between multiple clinical variables, with a particular emphasis on understanding their non-linear associations. However, while our approach provides valuable insights into risk assessment for BRCA-mutated patients, its current predictive capacity does not significantly improve upon existing clinical models.
由于缺乏能够早期检测卵巢癌发生的有效方法,建议携带BRCA1/2突变的女性在生育后进行双侧降低风险的输卵管卵巢切除术(RRSO)。因此,预测性机器学习(ML)技术对于帮助临床医生识别高风险BRCA1/2突变患者并确定进行RRSO的合适时机可能至关重要。
在这项工作中,我们通过使用来自意大利六个中心(波利克利尼科·杰梅利医院、IRCCS圣杰拉尔多医院、巴里波利克利尼科医院、雷吉娜·埃琳娜肿瘤研究所、乔瓦尼·保罗二世肿瘤研究所、F. 缪利医院)的694例BRCA1/2突变患者的多中心队列临床数据,开发可解释的ML模型来解决这一任务,这些患者接受了输卵管卵巢切除术,其中39例患者显示有肿瘤(5.6%)。来自雷吉娜·埃琳娜肿瘤研究所和巴里波利克利尼科医院的数据用作外部验证队列(EVC)。其他数据用作研究队列(IC)。实施重采样和集成技术来处理数据集不平衡问题。可解释技术使我们能够识别模型针对所研究任务预测的一些保护因素和风险因素。
最佳ML模型在EVC上的AUC值为79.3%(95%置信区间:75.3% - 83.0%),准确率值为73.8%(95%置信区间:69.6% - 78.2%),灵敏度值为66.7%(95%置信区间:58.1% - 75.3%),特异性值为74.3%(95%置信区间:68.7% - 80.0%),G均值为70.4%(95%置信区间:63.0% - 76.0%)。尽管该模型总体表现良好,但其有限的灵敏度降低了其在这一高风险人群中的有效性。发现CA125、年龄和MatoRRSO变量是最显著的风险因素,这与临床观点一致。相反,诸如雌激素孕激素使用和未足月妊娠等变量起到了保护因素的作用。
我们的ML方案探索了多个临床变量之间的复杂关系,特别强调理解它们的非线性关联。然而,虽然我们的方法为BRCA突变患者的风险评估提供了有价值的见解,但其当前的预测能力并没有比现有的临床模型有显著提高。