Shaw Reid, Lokshin Anna E, Miller Michael C, Messerlian-Lambert Geralyn, Moore Richard G
Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Wilmot Cancer Institute, University of Rochester, Rochester, NY 14642, USA.
Hillman Cancer Center, University of Pittsburgh, Pittsburg, PA 15219, USA.
Cancers (Basel). 2022 Mar 2;14(5):1291. doi: 10.3390/cancers14051291.
To identify the most predictive parameters of ovarian malignancy and develop a machine learning (ML) based algorithm to preoperatively distinguish between a benign and malignant pelvic mass.
Retrospective study of 70 predictive parameters collected from 140 women with a pelvic mass. The women were split into a 3:1 "training" to "testing" dataset. Feature selection was performed using Gini impurity through an embedded random forest model and principal component analysis. Nine unique ML classifiers were assessed across a variety of model-specific hyperparameters using 25 bootstrap resamples of the training data. Model predictions were then combined into an ensemble stack by LASSO regression. The final ensemble stack and individual classifiers were then applied to the testing dataset to assess model performance.
Feature selection identified HE4, CA125, and transferrin as three predictive parameters of malignancy. Assessment of the ensemble stack on the testing dataset outperformed all individual ML classifiers in predicting malignancy. The ensemble stack demonstrated an accuracy of 97.1%, a receiver operating characteristic (ROC) area under the curve (AUC) of 0.951, and a sensitivity of 93.3% with a specificity of 100%.
Combining the measurement of three distinct biomarkers with the stacking of multiple ML classifiers into an ensemble can provide valuable preoperative diagnostic predictions for patients with a pelvic mass.
确定卵巢恶性肿瘤最具预测性的参数,并开发一种基于机器学习(ML)的算法,以在术前区分盆腔肿块的良恶性。
对140例盆腔肿块女性患者收集的70个预测参数进行回顾性研究。将这些女性患者分为3:1的“训练”数据集和“测试”数据集。通过嵌入式随机森林模型和主成分分析,使用基尼不纯度进行特征选择。使用训练数据的25次自助重采样,在各种特定于模型的超参数上评估9种独特的ML分类器。然后通过LASSO回归将模型预测合并为一个集成堆叠。最后将集成堆叠和各个分类器应用于测试数据集,以评估模型性能。
特征选择确定人附睾蛋白4(HE4)、癌抗原125(CA125)和转铁蛋白为恶性肿瘤的三个预测参数。在测试数据集上对集成堆叠的评估在预测恶性肿瘤方面优于所有单个ML分类器。集成堆叠的准确率为97.1%,曲线下面积(AUC)为0.951,灵敏度为93.3%,特异性为100%。
将三种不同生物标志物的测量与多个ML分类器堆叠成一个集成相结合,可以为盆腔肿块患者提供有价值的术前诊断预测。