使用常规数据验证机器学习模型在预测引产干预中的有效性:坦桑尼亚北部一家三级医院的基于注册的回顾性队列研究。
Validating machine learning models for the prediction of labour induction intervention using routine data: a registry-based retrospective cohort study at a tertiary hospital in northern Tanzania.
机构信息
College of Public Health, Zhengzhou University, Zhengzhou, China.
Science and Laboratory Technology, Dar es Salaam Institute of Technology, Dar es Salaam, Tanzania, United Republic of.
出版信息
BMJ Open. 2021 Dec 2;11(12):e051925. doi: 10.1136/bmjopen-2021-051925.
OBJECTIVES
We aimed at identifying the important variables for labour induction intervention and assessing the predictive performance of machine learning algorithms.
SETTING
We analysed the birth registry data from a referral hospital in northern Tanzania. Since July 2000, every birth at this facility has been recorded in a specific database.
PARTICIPANTS
21 578 deliveries between 2000 and 2015 were included. Deliveries that lacked information regarding the labour induction status were excluded.
PRIMARY OUTCOME
Deliveries involving labour induction intervention.
RESULTS
Parity, maternal age, body mass index, gestational age and birth weight were all found to be important predictors of labour induction. Boosting method demonstrated the best discriminative performance (area under curve, AUC=0.75: 95% CI (0.73 to 0.76)) while logistic regression presented the least (AUC=0.71: 95% CI (0.70 to 0.73)). Random forest and boosting algorithms showed the highest net-benefits as per the decision curve analysis.
CONCLUSION
All of the machine learning algorithms performed well in predicting the likelihood of labour induction intervention. Further optimisation of these classifiers through hyperparameter tuning may result in an improved performance. Extensive research into the performance of other classifier algorithms is warranted.
目的
我们旨在确定分娩诱导干预的重要变量,并评估机器学习算法的预测性能。
背景
我们分析了来自坦桑尼亚北部一家转诊医院的出生登记数据。自 2000 年 7 月以来,该机构的每一次分娩都记录在一个特定的数据库中。
参与者
2000 年至 2015 年期间共纳入 21578 例分娩。排除了缺乏分娩诱导状态信息的分娩。
主要结局
涉及分娩诱导干预的分娩。
结果
多胎、产妇年龄、体重指数、胎龄和出生体重均被认为是分娩诱导的重要预测因素。提升法表现出最佳的判别性能(曲线下面积,AUC=0.75:95%置信区间(0.73 至 0.76)),而逻辑回归表现出最差的性能(AUC=0.71:95%置信区间(0.70 至 0.73))。根据决策曲线分析,随机森林和提升算法显示出最高的净收益。
结论
所有机器学习算法在预测分娩诱导干预的可能性方面表现良好。通过超参数调整进一步优化这些分类器可能会提高性能。有必要对其他分类器算法的性能进行广泛研究。