Do Huy D, Allison Jeroan J, Nguyen Hoa L, Phung Hai N, Tran Cuong D, Le Giang M, Nguyen Trang T
Hanoi Medical University, Hanoi, Viet Nam.
UMass Chan Medical School, Worcerster, MA, USA.
Heliyon. 2024 Jul 15;10(15):e34476. doi: 10.1016/j.heliyon.2024.e34476. eCollection 2024 Aug 15.
This paper describes the development of low-cost, effective, non-invasive machine learning-based prediction models for Down Syndrome in the first two trimesters of pregnancy in Vietnam. These models are adaptable to different situations with limited screening capacities at community-based healthcare facilities.
Ultrasound and biochemical testing alone and in combination, from both trimesters were employed to build prediction models based on k-Nearest Neighbor, Support Vector Machine, Random Forest, and Extreme Gradient Boosting algorithms.
A total of 7,076 pregnant women from a single site in Northern Vietnam were included, and 1,035 had a fetus with Down Syndrome. Combined ultrasound and biochemical testing were required to achieve the highest accuracy in trimester 2, while models based only on biochemical testing performed as well as models based on combined testing during trimester 1. In trimester 1, Extreme Gradient Boosting produced the best model with 94% accuracy and 88% AUC, while Support Vector Machine produced the best model in trimester 2 with 89% accuracy and 84% AUC.
This study explored a range of machine learning models under different testing scenarios. Findings point to the potential feasibility of national screening, especially in settings without enough equipment and specialists, after additional model validation and fine tuning is performed.
本文描述了越南在妊娠前两个阶段开发低成本、有效、非侵入性的基于机器学习的唐氏综合征预测模型。这些模型适用于社区医疗设施筛查能力有限的不同情况。
利用来自两个孕期单独及联合的超声和生化检测,基于k近邻、支持向量机、随机森林和极端梯度提升算法构建预测模型。
纳入了越南北方一个地点的7076名孕妇,其中1035名胎儿患有唐氏综合征。在孕中期,联合超声和生化检测可达到最高准确率,而在孕早期,仅基于生化检测的模型与基于联合检测的模型表现相当。在孕早期,极端梯度提升算法产生了最佳模型,准确率为94%,曲线下面积(AUC)为88%;在孕中期,支持向量机产生了最佳模型,准确率为89%,AUC为84%。
本研究探索了不同检测场景下的一系列机器学习模型。研究结果表明,在进行额外的模型验证和微调后,全国性筛查具有潜在可行性,尤其是在设备和专家不足的情况下。