Bai Xi, Zhou Zhibo, Luo Yunyun, Yang Hongbo, Zhu Huijuan, Chen Shi, Pan Hui
Key Laboratory of Endocrinology of National Health Commission, Department of Endocrinology, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Beijing 100730, China.
J Pers Med. 2022 Mar 31;12(4):550. doi: 10.3390/jpm12040550.
Exposure to radiation has been associated with increased risk of delivering small-for-gestational-age (SGA) newborns. There are no tools to predict SGA newborns in pregnant women exposed to radiation before pregnancy. Here, we aimed to develop an array of machine learning (ML) models to predict SGA newborns in women exposed to radiation before pregnancy. Patients' data was obtained from the National Free Preconception Health Examination Project from 2010 to 2012. The data were randomly divided into a training dataset ( = 364) and a testing dataset ( = 91). Eight various ML models were compared for solving the binary classification of SGA prediction, followed by a post hoc explainability based on the SHAP model to identify and interpret the most important features that contribute to the prediction outcome. A total of 455 newborns were included, with the occurrence of 60 SGA births (13.2%). Overall, the model obtained by extreme gradient boosting (XGBoost) achieved the highest area under the receiver-operating-characteristic curve (AUC) in the testing set (0.844, 95% confidence interval (CI): 0.713-0.974). All models showed satisfied AUCs, except for the logistic regression model (AUC: 0.561, 95% CI: 0.355-0.768). After feature selection by recursive feature elimination (RFE), 15 features were included in the final prediction model using the XGBoost algorithm, with an AUC of 0.821 (95% CI: 0.650-0.993). ML algorithms can generate robust models to predict SGA newborns in pregnant women exposed to radiation before pregnancy, which may thus be used as a prediction tool for SGA newborns in high-risk pregnant women.
暴露于辐射与分娩小于胎龄(SGA)新生儿的风险增加有关。目前尚无工具可预测孕前暴露于辐射的孕妇所分娩的SGA新生儿。在此,我们旨在开发一系列机器学习(ML)模型,以预测孕前暴露于辐射的女性所分娩的SGA新生儿。患者数据来自2010年至2012年的国家免费孕前健康检查项目。数据被随机分为训练数据集(n = 364)和测试数据集(n = 91)。比较了八种不同的ML模型来解决SGA预测的二元分类问题,随后基于SHAP模型进行事后可解释性分析,以识别和解释对预测结果有重要贡献的特征。总共纳入了455例新生儿,其中60例为SGA出生(13.2%)。总体而言,极端梯度提升(XGBoost)获得的模型在测试集中达到了最高的受试者工作特征曲线下面积(AUC)(0.844,95%置信区间(CI):0.713 - 0.974)。除逻辑回归模型外(AUC:0.561,95% CI:0.355 - 0.768),所有模型的AUC均令人满意。通过递归特征消除(RFE)进行特征选择后,最终使用XGBoost算法的预测模型纳入了15个特征,AUC为0.821(95% CI:0.650 - 0.993)。ML算法可以生成强大的模型来预测孕前暴露于辐射的孕妇所分娩的SGA新生儿,因此可作为高危孕妇SGA新生儿的预测工具。